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POSITION DEPENDENT RECOGNITION OF 
GNN NUCLEOTIDE TRIPLETS BY ZINC FINGERS 

CROSS-REFERENCES TO RELATED APPLICATIONS 
The present application is a continuation-in-part of copending U.S. Patent 
Application Serial No. 09/535,008, filed March 23, 2000, which application claims the 
benefit of U.S. provisional applications 60/126,238, filed March 24, 1999, 60/126,239 
filed March 24, 1999, 60/146,595 filed July 30, 1999 and 60/146,615 filed July 30, 1999. 
The present application is also a continuation-in-part of copending U.S. Patent 
Application Serial No. 09/716,637, filed November 20, 2000. The disclosures of all of 
the aforementioned apphcations are hereby incorporated by reference in their entireties 
for all purposes. 

BACKGROUND 

Zinc finger proteins (ZFPs) are proteins that can bind to DNA in a sequence- 
specific manner. Zinc fingers were first identified in the transcription factor TFIIIA fi*om 
the oocytes of the African clawed toad, Xenopus laevis. An exemplary motif 
characterizing one class of these protein (C2H2 class) is -Cys-(X)2-4-Cys-(X)i2-His-(X)3-5- 
His (where X is any amino acid) (SEQ. ID. No:l). A single finger domain is about 30 
amino acids in length, and several structural studies have demonstrated that it contains an 
alpha helix containing the two invariant histidine residues and two invariant cysteine 
residues in a beta turn co-ordinated through zinc. To date, over 10,000 zinc finger 
sequences have been identified in several thousand known or putative transcription 
factors. Zinc finger domains are involved not only in DNA-recognition, but also in RNA 
binding and in protein-protein binding. Current estimates are that this class of molecules 
will constitute about 2% of all human genes. 

The x-ray crystal structure of Zif268, a three-finger domain from a murine 
transcription factor, has been solved in complex with a cognate DNA sequence and 
shows that each finger can be superimposed on the next by a periodic rotation. The 
structure suggests that each finger interacts independently with DNA over 3 base-pair 
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intervals, with side-chains at positions -1,2,3 and 6 on each recognition hehx making 
contacts with their respective DNA triplet subsites. The amino terminus of Zif268 is 
situated at the 3 ' end of the DNA strand with which it makes most contacts. Some zinc 
fingers can bind to a fourth base in a target segment. If the strand with which a zinc 
finger protein makes most contacts is designated the target strand, some zinc finger 
proteins bind to a three base triplet in the target strand and a fourth base on the nontarget 
strand. The fourth base is complementary to the base immediately 3' of the three base 
subsite. 

The structure of the Zif268-DNA complex also suggested that the DNA sequence 
specificity of a zinc finger protein might be altered by making amino acid substitutions at 
the four helix positions (-1, 2, 3 and 6) on each of the zinc finger recognition helices. 
Phage display experiments using zinc finger combinatorial libraries to test this 
observation were pubUshed in a series of papers in 1994 (Rebar et al., Science 263, 671- 
673 (1994); Jamieson et al.. Biochemistry 33, 5689-5695 (1994); Choo et al, PNAS 91, 
1 1 163-1 1 167 (1994)). Combinatorial libraries were constructed with randomized side- 
chains in either the first or middle finger of Zif268 and then used to select for an altered 
Zif268 binding site in which the appropriate DNA sub-site was replaced by an altered 
DNA triplet. Further, correlation between the nature of introduced mutations and the 
resulting alteration in binding specificity gave rise to a partial set of substitution rules for 
design of ZFPs with altered binding specificity. 

Greisman & Pabo, Science 275, 657-661 (1997) discuss an elaboration of the 
phage display method in which each finger of a Zif268 was successively randomized and 
selected for binding to a new triplet sequence. This paper reported selection of ZFPs for a 
nuclear hormone response element, a p53 target site and a TATA box sequence. 

A number of papers have reported attempts to produce ZFPs to modulate 
particular target sites. For example, Choo et al. Nature 372, 645 (1994), report an 
attempt to design a ZFP that would repress expression of a bcr-abl oncogene. The target 
segment to which the ZFPs would bind was a nine.base sequence 5'GCA GAA GCC3' 
chosen to overlap the junction created by a specific oncogenic translocation fiising the 
genes encoding bcr and abl. The intention was that a ZFP specific to this target site 
would bind to the oncogene without binding to abl or bcr component genes. The authors 
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used phage display to screen a mini-library of variant ZFPs for binding to this target 
segment. A variant ZFP thus isolated was then reported to repress expression of a stably 
transfected bcr-able construct in a cell line. 

Pomerantz et al., Science 267, 93-96 (1995) reported an attempt to design a novel 
DNA binding protein by fusing two fingers from Zif268 with a homeodomain from Oct- 
1 . The hybrid protein was then fiised with a transcriptional activator for expression as a 
chimeric protein. The chimeric protein was reported to bind a target site representing a 
hybrid of the subsites of its two components. The authors then constructed a reporter 
vector containing a luciferase gene operably linked to a promoter and a hybrid site for the 
chimeric DNA binding protein in proximity to the promoter. The authors reported that 
their chimeric DNA binding protein could activate expression of the luciferase gene. 

Liu et al., PNAS 94, 5525-5530 (1997) report forming a composite zinc finger 
protein by using a peptide spacer to link two component zinc finger proteins each having 
three fmgers. The composite protein was then further linked to transcriptional activation 
domain. It was reported that the resulting chimeric protein bound to a target site formed 
from the target segments bound by the two component zinc finger proteins. It was further 
reported that the chimeric zinc finger protein could activate transcription of a reporter 
gene when its target site was inserted into a reporter plasmid in proximity to a promoter 
operably linked to the reporter. 

Choo et al., WO 98/53058, WO98/53059, and WO 98/53060 (1998) discuss 
selection of zinc finger proteins to bind to a target site within the HIV Tat gene. Choo et 
al. also discuss selection of a zuac finger protein to bind to a target site encompassing a 
site of a common mutation in the oncogene ras. The target site within ras was thus 
constrained by the position of the mutation. 

Previously-disclosed methods for the design of sequence-specific zinc finger 
proteins have often been based on modularity of individual zinc fingers; i.e., the abihty 
of a zinc finger to recognize the same target subsite regardless of the location of the 
finger in a multi-finger protein. Although, in many instances, a zinc finger retains the 
same sequence specificity regardless of its location within a multi-finger protein; in 
certain cases, the sequence specificity of a zinc finger depends on its position. For 
example, it is possible for a fmger to recognize a particular triplet sequence when it is 
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present as finger 1 of a three-finger protein, but to recognize a different triplet sequence 
when present as finger 2 of a three-finger protein. 

Attempts to address situations in which a zinc finger behaves in a non-modular 
fashion (z.e., its sequence specificity depends upon its location in a multi-finger protein) 
have, to date, involved strategies employing randomization of key binding residues in 
multiple adjacent zinc fingers, followed by selection. See, for example, Isalan et al 
(2001) Nature Biotechnol. 19:656-660. However, methods for rational design of 
polypeptides containing non-modular zinc fingers have not heretofore been described. 

SUMMARY 

The present disclosure provides compositions comprising and methods involving 
position dependent recognition of GNN nucleotide triplets by zinc fingers. 

Thus, provided herein is a zinc finger protein that binds to a target site, said zinc 
finger protein comprising a first (Fl), a second (F2), and a third (F3) zinc finger, ordered 
Fl, F2, F3 fi-om N-terminus to C-terminus, said target site comprising, in 3' to 5' 
direction, a first (SI), a second (S2), and a third (S3) target subsite, each target subsite 
having the nucleotide sequence GNN, wherein if SI comprises GAA, Fl comprises the 
amino acid sequence QRSNLVR; if S2 comprises GAA, F2 comprises the amino acid 
sequence QSGNLAR; if S3 comprises GAA, F3 comprises the amino acid sequence 
QSGNLAR; if SI comprises GAG, Fl comprises the amino acid sequence RSDNLAR; if 

52 comprises GAG, F2 comprises the amino acid sequence RSDNLAR; if S3 comprises 
GAG, F3 comprises the amino acid sequence RSDNLTR; if SI comprises GAC, Fl 
comprises the amino acid sequence DRSNLTR; if S2 comprises GAC, F2 comprises the 
amino acid sequence DRSNLTR; if S3 comprises GAC, F3 comprises the amino acid 
sequence DRSNLTR; if SI comprises GAT, Fl comprises the amino acid sequence 
QSSNLAR; if S2 comprises GAT, F2 comprises the amino acid sequence TSGNLVR; if 

53 comprises GAT, F3 comprises the amino acid sequence TSANLSR; if SI comprises 
GGA, Fl comprises the amino acid sequence QSGHLAR; if S2 comprises GGA, F2 
comprises the amino acid sequence QSGHLQR; if S3 comprises GGA, F3 comprises the 
amino acid sequence QSGHLQR; if SI comprises GGG, Fl comprises the amino acid 
sequence RSDHLAR; if S2 comprises GGG, F2 comprises the amino acid sequence 
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RSDHLSR; if S3 comprises GGG, F3 comprises the amino acid sequence RSDHLSR; if 
SI comprises GGC, Fl comprises the amino acid sequence DRSHLRT; if S2 comprises 
GGC, F2 comprises the amino acid sequence DRSHLAR; if SI comprises GGT, Fl 
comprises the amino acid sequence QSSHLTR; if S2 comprises GGT, F2 comprises the 
amino acid sequence TSGHLSR; if S3 comprises GGT, F3 comprises the amino acid 
sequence TSGHLVR; if SI comprises GCA, Fl comprises the amino acid sequence 
QSGSLTR; if S2 comprises GCA, F2 comprises QSGDLTR; if S3 comprises GCA, F3 
comprises QSGDLTR; if SI comprises GCG, Fl comprises the amino acid sequence 
RSDDLTR; if S2 comprises GCG, F2 comprises the amino acid sequence RSDDLQR; if 
S3 comprises GCG, F3 comprises the amino acid sequence RSDDLTR; if SI comprises 
GCC, Fl comprises the amino acid sequence ERGTLAR; if S2 comprises GCC, F2 
comprises the amino acid sequence DRSDLTR; if S3 comprises GCC, F3 comprises the 
amino acid sequence DRSDLTR; if SI comprises GCT, Fl comprises the amino acid 
sequence QSSDLTR; if S2 comprises GCT, F2 comprises the amino acid sequence 
QSSDLTR; if S3 comprises GCT, F3 comprises the amino acid sequence QSSDLQR; if 
SI comprises GTA, Fl comprises the amino acid sequence QSGALTR; if S2 comprises 
GTA, F2 comprises the amino acid sequence QSGALAR; if SI comprises GTG, Fl 
comprises the amino acid sequence RSDALTR; if S2 comprises GTG, F2 comprises the 
amino acid sequence RSDALSR; if S3 comprises GTG, F3 comprises the amino acid 
sequence RSDALTR; if SI comprises GTC, Fl comprises the amino acid sequence 
DRS ALAR; if S2 comprises GTC, F2 comprises the amino acid sequence DRS ALAR; 
and if S3 comprises GTC, F3 comprises the amino acid sequence DRSALAR. 

Also provided are methods of designing a zinc finger protein comprising a first 
(Fl), a second (F2), and a third (F3) zinc finger, ordered Fl, F2, F3 from N-terminus to 
C-terminus that binds to a target site comprising, in 3* to 5' direction, a first (SI), a 
second (S2), and a third (S3) target subsite, each target subsite having the nucleotide 
sequence GNN, the method comprising the steps of (a) selecting the Fl zinc fmger such 
that it binds to the SI target subsite, wherein if SI comprises GAA, Fl comprises the 
amino acid sequence QRSNLVR; if SI comprises GAG, Fl comprises the amino acid 
sequence RSDNLAR; if SI comprises GAC, Fl comprises the amino acid sequence 
DRSNLTR; if SI comprises GAT, Fl comprises the amino acid sequence QSSNLAR; if 
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51 comprises GGA, Fl comprises the amino acid sequence QSGHLAR; if SI comprises 
GGG, Fl comprises the amino acid sequence RSDHLAR; if 81 comprises GGC, Fl 
comprises the amino acid sequence DRSHLRT; if SI comprises GGT, Fl comprises the 
amino acid sequence QSSHLTR; if SI comprises GCA, Fl comprises QSGSLTR; if SI 
comprises GCG, Fl comprises RSDDLTR; if S2 comprises GCG, F2 comprises 
RSDDLQR; if SI comprises GCC, Fl comprises ERGTLAR; if SI comprises GCT, Fl 
comprises the amino acid sequence QSSDLTR; if SI comprises GTA, Fl comprises the 
amino acid sequence QSGALTR; if SI comprises GTG, Fl comprises the amino acid 
sequence RSDALTR; if SI comprises GTC, Fl comprises the amino acid sequence 
DRSALAR; (b) selecting the F2 zinc finger such that it binds to the S2 target subsite, 
wherein S2 comprises GAA, F2 comprises the amino acid sequence QSGNLAR; if S2 
comprises GAG, F2 comprises the amino acid sequence RSDNLAR; if S2 comprises 
GAC, F2 comprises the amino acid sequence DRSNLTR; if S2 comprises GAT, F2 
comprises the amino acid sequence TSGNLVR; if S2 comprises GGA, F2 comprises the 
amino acid sequence QSGHLQR; if S2 comprises GGG, F2 comprises the amino acid 
sequence RSDHLSR; if S2 comprises GGC, F2 comprises the amino acid sequence 
DRSHLAR; if S2 comprises GGT, F2 comprises the amino acid sequence TSGHLSR; if 

52 comprises GCA, F2 comprises the amino acid sequence QSGDLTR; if S2 comprises 
GCC, F2 comprises the amino acid sequence DRSDLTR; if S2 comprises GCT, F2 
comprises the amino acid sequence QSSDLTR; if S2 comprises GTA, F2 comprises the 
amino acid sequence QSGALAR; if S2 comprises GTG, F2 comprises the amino acid 
sequence RSD ALSR; if S2 comprises GTC, F2 comprises the amino acid sequence 
DRSALAR; and (c) selecting the F3 zinc finger such that it binds to the S3 target subsite, 
wherein if S3 comprises GAA, F3 comprises the amino acid sequence QSGNLAR; if S3 
comprises GAG, F3 comprises the amino acid sequence RSDNLTR; if S3 comprises 
GAC, F3 comprises the amino acid sequence DRSNLTR; if S3 comprises GAT, F3 
comprises the amino acid sequence TS ANLSR; if S3 comprises GGA, F3 comprises the 
amino acid sequence QSGHLQR; if S3 comprises .GGG, F3 comprises RSDHLSR; if S3 
comprises GGT, F3 comprises the amino acid sequence TSGHLVR; if S3 comprises 
GCA, F3 comprises the amino acid sequence QSGDLTR; if S3 comprises GCG, F3 
comprises the amino acid sequence RSDDLTR; if S3 comprises GCC, F3 comprises the 
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amino acid sequence DRSDLTR; if S3 comprises GCT, F3 comprises the amino acid 
sequence QSSDLQR; if S3 comprises GTG, F3 comprises RSDALTR; and if S3 
comprises GTC, F3 comprises the amino acid sequence DRSALAR; 

thereby designing a zinc finger protein that binds to a target site. 
5 In certain embodinaents of the zinc finger proteins and methods described herein, 

SI comprises GAA and Fl comprises the amino acid sequence QRSNLVR. In other 
embodiments, S2 comprises GAA and F2 comprises the amino acid sequence 
QSGNLAR. In other embodiments, S3 comprises GAA and F3 comprises the amino acid 
sequence QSGNLAR. In other embodiments, SI comprises GAG and Fl comprises the 

10 amino acid sequence RSDNLAR. In other embodiments, S2 comprises GAG and F2 
comprises the amino acid sequence RSDNLAR. In other embodiments, S3 comprises 
GAG and F3 comprises the amino acid sequence RSDNLTR. In other embodiments, SI 
comprises GAG and Fl comprises the amino acid sequence DRSNLTR. In other 
embodiments, S2 comprises GAC and F2 comprises the amino acid sequence 

15 DRSNLTR. In other embodiments, S3 comprises GAC and F3 comprises the amino acid 
sequence DRSNLTR. In other embodiments, SI comprises GAT and Fl comprises the 
amino acid sequence QSSNLAR. In other embodiments, S2 comprises GAT and F2 
comprises the amino acid sequence TSGNLVR. In other embodiments, S3 comprises 
GAT and F3 comprises the amino acid sequence TSANLSR. In other embodiments, SI 

20 comprises GGA and Fl comprises the amino acid sequence QSGHLAR. In other 
embodiments, S2 comprises GGA and F2 comprises the amino acid sequence 
QSGHLQR. In other embodiments, S3 comprises GGA and F3 comprises the amino acid 
sequence QSGHLQR. In other embodiments, SI comprises GGG and Fl comprises the 
amino acid sequence RSDHLAR. In other embodiments, S2 comprises GGG and F2 

25 comprises the amino acid sequence RSDHLSR. In other embodiments, S3 comprises 
GGG and F3 comprises the amino acid sequence RSDHLSR. In other embodiments, S 1 
comprises GGC and Fl comprises the amino acid sequence DRSHLTR. In other 
embodiments, S2 comprises GGC and F2 comprises the amino acid sequence 
DRSHLAR. In other embodiments, SI comprises GGT and Fl comprises the amino acid 

30 sequence QSSHLTR. In other embodiments, S2 comprises GGT and F2 comprises the 
amino acid sequence TSGHLSR. In other embodiments, S3 comprises GGT and F3 
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comprises the amino acid sequence TSGHLVR. In other embodiments, SI comprises 
GCA and Fl comprises the amino acid sequence QSGSLTR. In other embodiments, S2 
comprises GCA and F2 comprises the amino acid sequence QSGDLTR. In other 
embodiments, S3 comprises GCA and F3 comprises the amino acid sequence 
QSGDLTR. In other embodiments, SI comprises GCG and Fl comprises the amino acid 
sequence RSDDLTR. In other embodiments, S2 comprises GCG and F2 comprises the 
amino acid sequence RSDDLQR. In other embodiments, S3 comprises GCG and F3 
comprises the amino acid sequence RSDDLTR, In other embodiments, SI comprises 
GCC and Fl comprises the amino acid sequence ERGTLAR. In other embodiments, 82 
comprises GCC and F2 comprises the amino acid sequence DRSDLTR. In other 
embodiments, S3 comprises GCC and F3 comprises the amino acid sequence DRSDLTR. 
In other embodiments, SI comprises GCT and Fl comprises the amino acid sequence 
QSSDLTR. In other embodiments, S2 comprises GCT and F2 comprises the amino acid 
sequence QSSDLTR. In other embodiments, S3 comprises GCT and F3 comprises the 
amino acid sequence QSSDLQR. In other embodiments, SI comprises GTA and Fl 
comprises the amino acid sequence QSGALTR. In other embodiments, S2 comprises 
GTA and F2 comprises the amino acid sequence QSGALAR. In other embodiments, SI 
comprises GTG and Fl comprises the amino acid sequence RSDALTR. In other 
embodiments, S2 comprises GTG and F2 comprises the amino acid sequence RSDALSR. 
In other embodiments, S3 comprises GTG and F3 comprises the amino acid sequence 
RSDALTR. In other embodiments, SI comprises GTC and Fl comprises the amino acid 
sequence DRSALAR. In other embodiments, S2 comprises GTC and F2 comprises the 
amino acid sequence DRSALAR. In other embodiments, S3 comprises GTC and F3 
comprises the amino acid sequence DRSALAR. 

Also provided are polypeptides comprising any of zinc finger proteins described 

a. 

herein. In certain embodiments, the polypeptide further comprises at least one functional 
domain. Also provided are polynucleotides encoding any of the polypeptides described 
herein. Thus, also provided are nucleic acid encoding zinc fingers, including all of the 
zinc fingers described above. 
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Also provided are segments of a zinc finger comprising a sequence of seven 
contiguous amino acids as shown herein. Also provided are nucleic acids encoding any 
of these segments and zinc fingers comprising the same. 

Also provided are zinc finger proteins comprising first, second and third zinc 
fingers. The first, second and third zinc fingers comprise respectively first, second and 
third segments of seven contiguous amino acids as shown herein. Also provided are 
nucleic acids encoding such zinc finger proteins. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 shows results of site selection analysis of two representative zinc finger 
proteins (leftmost 4 columns) and measurements of binding affinity for each of these 
proteins to their intended target sequences and to variant target sequences, (rightmost 3 
columns). Analysis of ZFPl is shown in the upper portion of the figure and analysis of 
ZFP2 is shown in the lower portion of the figure. For the site selection analyses, the 
amino acid sequences of residues -1 through +6 of the recognition hehx of each of the 
three component zinc fingers (F3, F2 and Fl) are shown across the top row; the intended 
target sequence (divided into finger-specific target subsites) is shown across the second 
row, and a summary of the sequences bound is shown in the third row. Data for F3 is 
shown in the second column, data for F2 is shown in the third column, and data for Fl is 
shown in the third column. 

For the binding affinity analyses, the designed target sequence for each ZFP 
("cognate") and two related sequences (*'Mt") are shown (column 6), along with the K<i 
for binding of the ZFP to each of these sequences (column 7). 

Figure 2 shows amino acid sequences of zinc finger recognition regions (amino 
acids -1 through +6 of the recognition helix) that bind to each of the 16 GNN triplet 
subsites. Three amino acid sequences are shown for each trinucleotide subsite; these 
correspond to optimal amino acid sequences for recognition of the subsite fi-om each of 
the three positions (finger 1, Fl; finger 2, F2; or finger 3, F3) in a three-finger zinc finger 
protein. Amino acid sequences are from N-terminal to C-terminal; nucleotide sequences 
are firom 5' to 3'. 
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Also shown are site selection results for each of the 48 position-dependent GNN- 
recognizing zinc fingers. These show the number of times a particular nucleotide was 
present, at a given position, in a collection of oligonucleotide sequences bound by the 
finger. For example, out of 15 oligonucleotides bound by a zinc finger protein with the 
amino acid sequence QSGHLAR present at the finger 1 (Fl) position, 15 contained a G 
in the 5 '-most position of the subsite, 15 contained a G in the middle position of the 
subsite, while, at the 3 '-most position of the subsite, 10 contained an A, 3 contained a G 
and 2 contained a T. Accordingly, this particular amino acid sequence is optimal for 
binding a GGA triplet from the Fl position. 

Figures 3A, 3B and 3C show site selection data indicating positional dependence 
of GCA-, GAT- and GGT-binding zinc fingers. The first and fourth (where applicable) 
rows of each figure show portions of the amino acid sequence of a designed zinc finger 
protein. Amino acid residues- 1 through +6 of each a-helix are listed fi-om left to right. 
The second and fifth (where applicable) rows show the target sequence, divided into three 
triplet subsites, one for each finger of the protein shown in the first and fourth (where 
applicable) rows, respectively. The third and sixth (where applicable) rows show the 
distribution of nucleotides in the oligonucleotides obtained by site selection with the 
proteins shown in the first and fourth (where applicable) rows, respectively. Figure 3 A 
shows data for fingers designed to bind GCA; Figure 3B shows data for fingers designed 
to bind GAT; Figure 3C shows data for fingers designed to bind GGT. 

Figures 4 A and 4B show properties of the engineered ZFP EP2C. Figure 4 A 
shows site selection data. The first row provides the amino acid sequences of residues -1 
through +6 of the recognition helices for each of the three zinc fingers of the EP2C 
protein. The second row shows the target sequence (5' to 3'); with the distribution of 
nucleotides in the oligonucleotides obtained by site selection indicated below the target 
sequence. 

Figure 4B shows in vitro and in vivo assays for the binding specificity of EP2C. 
The first three columns show in vitro measurements of binding affinity of EP2C to its 
intended target sequence and several related sequences. The first column gives the name 
of each sequence (2C0 is the intended target sequence, compare to Figure 4A). The 
second column shows the nucleotide sequence of various target sequences, with 
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differences from the intended target sequence (2C0) highlighted. The third column 
shows the Kd (in nM) for binding of EP2C to each of the target sequences. K<iS were 
determined by gel shift assays, using 2-fold dilution series of EP2C. The right side of the 
figure (fourth column and bar graph) shows relative luciferase activities (normalized to 
P-galactosidase levels) in stable cell lines in which expression of EP2C is inducible. 
Cells were co-transfected with a vector containing a luciferase coding region under the 
transcriptional control of the target sequence shown in the same row of the figure, and a 
control vector encoding P-galactosidase. Luciferase and P-galactosidase levels were 
measured after induction of EP2C expression. Triplicate samples were assayed and the 
standard deviations are shown in the bar graph. pGL3 is a luciferase-encoding vector 
lacking EP2C target sequences. 3B is another negative control, in which luciferase 
expression is under transcriptional control of sequences (3B) unrelated to the EP2C target 
sequence. 

DEFINITIONS 

A zinc finger DNA binding protein is a protein or segment within a larger protein 
that binds DNA in a sequence-specific manner as a result of stabilization of protein 
structure through coordination of a zinc ion. The term zinc finger DNA binding protein 
is often abbreviated as zinc finger protein or ZFP. 

Zinc finger proteins can be engineered to recognize a selected target sequence in a 
nucleic acid. Any method known in the art or disclosed herein can be used to construct 
an engineered zinc finger protein or a nucleic acid encoding an engineered zinc finger 
protein. These include, but are not limited to, rational design, selection methods (e.g., 
phage display) random mutagenesis, combinatorial libraries, computer design, affinity 
selection, use of databases matching zinc finger amino acid sequences with target subsite 
nucleotide sequences, cloning fi"om cDNA and/or genomic libraries, and synthetic 
constructions. An engineered zinc finger protein can comprise a new combination of 
naturally-occurring zinc finger sequences. Methods for engineering zinc finger proteins 
are disclosed in co-owned WO 00/41566 and WO 00/42219; as well as in WO 98/53057; 
WO 98/53058; WO 98/53059 and WO 98/53060; the disclosures of which are hereby 
incorporated by reference in their entireties. Methods for identifying preferred target 
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sequences, and for engineering zinc finger proteins to bind to such preferred target 
sequences, are disclosed in co-owned WO 00/42219. 

A designed zinc finger protein is a protein not occurring in nature whose 
design/composition results principally from rational criteria. Rational criteria for design 
include application of substitution rules and computerized algorithms for processing 
information in a database storing information of existing ZFP designs and binding data. 

A selected zinc finger protein is a protein not found in nature whose production 
results primarily from an empirical process such as phage display. 

The term naturally-occurring is used to describe an object that can be found in 
nature as distinct from being artificially produced by man. For example, a polypeptide or 
polynucleotide sequence that is present in an organism (including viruses) that can be 
isolated from a source in nature and which has not been intentionally modified by man in 
the laboratory is naturally-occurring. Generally, the term naturally-occurring refers to an 
object as present in a non-pathological (undiseased) individual, such as would be typical 
for the species. 

A nucleic acid is operably linked when it is placed into a fimctional relationship 
with another nucleic acid sequence. For instance, a promoter or enhancer is operably 
linked to a coding sequence if it increases the transcription of the coding sequence. 
Operably linked means that the DNA sequences being linked are typically contiguous 
and, where necessary to join two protein coding regions, contiguous and in reading 
fi'ame. However, since enhancers generally function when separated fi*om the promoter 
by up to several kilobases or more and intronic sequences may be of variable lengths, 
some polynucleotide elements may be operably linked but not contiguous. 

A specific binding affinity between, for example, a ZFP and a specific target site 
means a binding affinity of at least 1x10^ M'\ 

The terms "modulating expression" "inhibiting expression" and "activating 
expression" of a gene refer to the ability of a zinc finger protein to activate or inhibit 
transcription of a gene. Activation includes prevention of subsequent transcriptional 
inhibition (i.e., prevention of repression of gene expression) and inhibition includes 
prevention of subsequent transcriptional activation (i.e., prevention of gene activation). 
Modulation can be assayed by determining any parameter that is indirectly or directly 
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affected by the expression of the target gene. Such parameters include, e.g., changes in 
RNA or protein levels, changes in protein activity, changes in product levels, changes in 
downstream gene expression, changes in reporter gene transcription (luciferase, CAT, 
beta-galactosidase, GFP (see, e.g., Mistili & Spector, Nature Biotechnology 15:961-964 
(1997)); changes in signal transduction, phosphorylation and dephosphorylation, 
receptor-hgand interactions, second messenger concentrations (e.g., cGMP, cAMP, IPS, 
and Ca2+), cell growth, neovascularization, in vitro, in vivo, and ex vivo. Such functional 
effects can be measured by any means known to those skilled in the art, e.g., 
measurement of RNA or protein levels, measurement of RNA stability, identification of 
downstream or reporter gene expression, e.g., via chemiluminescence, fluorescence, 
colorimetric reactions, antibody binding, inducible markers, ligand binding assays; 
changes in intracellular second messengers such as cGMP and inositol triphosphate (IP3); 
changes in intracellular calcium levels; cytokine release, and the like. 

A "regulatory domain" refers to a protein or a protein subsequence that has 
transcriptional modulation activity. Typically, a regulatory domain is covalently or non- 
covalently linked to a ZFP to modulate transcription. Alternatively, a ZFP can act alone, 
without a regulatory domain, or with multiple regulatory domains to modulate 
transcription. 

A D-able subsite within a target site has the motif 5'NNGK3'. A target site 
containing one or more such motifs is sometimes described as a D-able target site. A 
zinc finger appropriately designed to bind to a D-able subsite is sometimes referred to as 
a D-able finger. Likewise a zinc finger protein containing at least one finger designed or 
selected to bind to a target site including at least one D-able subsite is sometimes referred 
to as a D-able zinc finger protein. 

DETAILED DESCRIPTION 

I. General 

Tables 1-5 list a collection of nonnaturally occurring zinc finger protein 
sequences and their corresponding target sites. The first column of each table is an 
internal reference number. The second column Usts a 9 or 10 base target site bound by a 
three-finger zinc finger protein, with the target sites Usted in 5' to 3' orientation. The 
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third column provides SEQ ID NOs for the target site sequences Ksted in column 2. The 
fourth, sixth and eighth columns list amino acid residues from the first, second and third 
fingers, respectively, of a zinc finger protein which recognizes the target sequence listed 
in the second column. For each finger, seven amino acids, occupying positions -1 to +6 
of the finger, are hsted. The numbering convention for zinc fingers is defined below. 
Columns 5, 7 and 9 provide SEQ ID NOs for the amino acid sequences hsted in columns 
4, 6 and 8, respectively. The final column of each table lists the binding affinity {i.e., the 
Ka in nM) of the zinc finger protein for its target site. Binding affinities are measured as 
described below. 

Each finger binds to a triplet of bases within a corresponding target sequence. 
The first finger binds to the first triplet starting from the 3' end of a target site, the second 
finger binds to the second triplet, and the third finger binds the third (i.e., the 5 '-most) 
triplet of the target sequence. For example, the RSDSLTS finger (SEQ ID NO: 646) of 
SBS# 201 (Table 2) binds to 5'TTG3', the ERSTLTR finger (SEQ ID NO: 851) binds 
to5'GCC3' and the QRADLRR finger (SEQ ID NO: 1056) binds to 5'GCA3\ 

Table 6 lists a collection of consensus sequences for zinc fingers and the target 
sites bound by such sequences. Conventional one letter amino acid codes are used to 
designate amino acids occupying consensus positions. The symbol "X" designates a 
nonconsensus position that can in principle be occupied by any amino acid. In most zinc 
fingers of the C2H2 tj^e, binding specificity is principally conferred by residues -1, +2, 
H-3 and +6. Accordingly, consensus sequence determining binding specificity typically 
include at least these residues. Consensus sequences are usefiil for designing zinc fingers 
to bind to a given target sequence. Residues occupying other positions can be selected 
based on sequences in Tables 1-5, or other known zinc finger sequences. Alternatively, 
these positions can be randomized with a plurality of candidate amino acids and screened 
against one or more target sequences to refine binding specificity or improve binding 
specificity. In general, the same consensus sequence can be used for design of a zinc 
finger regardless of the relative position of that finger in a multi-finger zinc finger 
protein. For example, the sequence RXDNXXR can be used to design a N-terminal, 
central or C-terminal finger of three finger protein. However, some consensus sequences 
are most suitable for designing a zinc finger to occupy a particular position in a multi- 
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finger protein. For example, the consensus sequence RXDHXXQ is most suitable for 
designing a C-terminal finger of a three-finger protein. 

IL Characteristics of Zinc Finger Proteins 

Zinc finger proteins are formed fi-om zinc finger components. For example, zinc 
finger proteins can have one to thirty-seven fingers, commonly having 2, 3, 4, 5 or 6 
fingers. A zinc finger protein recognizes and binds to a target site (sometimes referred to 
as a target segment) that represents a relatively small subsequence within a target gene. 
Each component finger of a zinc finger protein can bind to a subsite within the target site. 
The subsite includes a triplet of three contiguous bases all on the same strand (sometimes 
referred to as the target strand). The subsite may or may not also include a fourth base on 
the opposite strand that is the complement of the base immediately 3' of the three 
contiguous bases on the target strand. In many zinc finger proteins, a zinc finger binds to 
its triplet subsite substantially independently of other fingers in the same zinc finger 
protein. Accordingly, the binding specificity of zinc finger protein containing multiple 
fingers is usually approximately the aggregate of the specificities of its component 
fmgers. For example, if a zinc fmger protein is formed fi'om first, second and third 
fingers that individually bind to triplets XXX, YYY, and ZZZ, the binding specificity of 
the zinc finger protein is 3 'XXX YYY ZZZ5', 

The relative order of fingers in a zinc finger protein firom N- terminal to C- 
terminal determines the relative order of triplets in the 3' to 5' direction in the target. 
For example, if a zinc fmger protein comprises fi'om N-terminal to C-terminal first, 
second and third fingers that individualy bind, respectively, to triplets 5' GAC3', 
5'GTA3' and 5"GGC3' then the zinc finger protein binds to the target segment 
3 'C AGATGCGG5 ' . If the zinc fmger protein comprises the fingers in another order, for 
example, second finger, first finger, third finger, then the zinc finger protein binds to a 
target segment comprising a different permutation of triplets, in this example, 
3'ATGCAGCGG5' (see Berg & Shi, Science 111, 1081-1086 (1996)). The assessment 
of binding properties of a zinc finger protein as the aggregate of its component fmgers 
may, in some cases, be influenced by context-dependent interactions of multiple fingers 
binding in the same protein. 
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Two or more zinc finger proteins can be linked to have a target specificity that is 
the aggregate of that of the component zinc finger proteins (see e.g., Kim & Pabo, PNAS 
95, 2812-2817 (1998)). For example, a first zinc finger protein having first, second and 
third component fingers that respectively bind to XXX, YYY and ZZZ can be linked to a 
second zinc finger protein having first, second and third component fingers with binding 
specificities, AAA, BBB and CCC. The binding specificity of the combined first and 
second proteins is thus 3 ' XXX YYYZZZ AAABBBCCC5 ' , where the underlme 
indicates a short intervening region (typically 0-5 bases of any type). In this situation, the 
target site can be viewed as comprising two target segments separated by an intervening 
segment. 

Linkage can be accomplished using any of the following peptide linkers. 
T G E K P: (SEQ. ID. No:2) (Liu et al, 1997, supra.); (G4S)n (SEQ. ID. No:3) (Kim et 
al., PNAS 93, 1156-1160 (1996.); GGRRGGGS; (SEQ. ID. No:4) LRQRDGERP; (SEQ. 
ID. No:5) LRQKDGGGSERP; (SEQ. ID. No:6) LRQKD(G3S)2 ERP (SEQ. ID. No:7) 
Alternatively, flexible linkers can be rationally designed using computer programs 
capable of modeling both DNA-binding sites and the peptides themselves or by phage 
display methods . In a further variation, noncovalent linkage can be achieved by fusing 
two zinc finger proteins with domains promoting heterodimer formation of the two zinc 
finger proteins. For example, one zinc finger protein can be fused with fos and the other 
with jun (see Barbas et al., WO 95/119431). 

Linkage of two zinc finger proteins is advantageous for conferring a unique 
binding specificity within a mammalian genome. A typical mammalian diploid genome 
consists of 3 X 10^ bp. Assuming that the four nucleotides A, C, G, and T are randomly 
distributed, a given 9 bp sequence is present --23,000 times. Thus a ZFP recognizing a 9 
bp target with absolute specificity would have the potential to bind to -23,000 sites 
within the genome. An 18 bp sequence is present once in 3.4 x 10^^ bp, or about once in 
a random DNA sequence whose complexity is ten times that of a mammalian genome, 

A component finger of zinc finger protein typically contains about 30 amino acids 
and has the following motif (N-C) : 

(SEQ. ID. No:8) 

Cys- (X)2-4-Cys-X.X.X.X,X,X.X.X.X,X.X.X-His- (Xja^s-His 
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-11234567 

The two invariant histidine residues and two invariant cysteine residues in a single 
beta turn are co-ordinated through zinc (see, e.g., Berg & Shi, Science 111, 1081-1085 
(1996)). The above motif shows a numbering convention that is standard in the field for 
the region of a zinc finger conferring binding specificity. The amino acid on the left (N- 
terminal side) of the first invariant His residues is assigned the number +6, and other 
amino acids fiirther to the left are assigned successively decreasing numbers. The alpha 
helix begins at residue 1 and extends to the residue following the second conserved 
histidine. The entire helix is therefore of variable length, between 1 1 and 13 residues. 

The process of designing or selecting a nonnaturally occurring or variant ZFP 
typically starts with a natural ZFP as a source of fi'amework residues. The process of 
design or selection serves to define nonconserved positions (i.e., positions -1 to +6) so as 
to confer a desired binding specificity. One suitable ZFP is the DNA binding domain of 
the mouse transcription factor Zif268. The DNA binding domain of this protein has the 
amino acid sequence: 

YACPVESCDRRFSRSDELTRHIRIHTGQKP (Fl) (SEQ. ID No:9) 
FQCRICMRNFSRSDHLTTHIRTHTGEKP (F2) (SEQ. ID. No:10) 
FACDICGRKFARSDERKRHTKIHLRQK (F3) SEQ. ID. No:l 1) 
and binds to a target 5' GCG TGG GCG 3' (SEQ ID No:12). 

Another suitable natural zinc finger protein as a source of framework residues is 
Sp-1. The Sp-1 sequence used for construction of zinc finger proteins corresponds to 
amino acids 531 to 624 in the Sp-1 transcription factor. This sequence is 94 amino acids 
in length. The amino acid sequence of Sp-1 is as follows: 
PGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERP 
FMCTWSYCGKRFTRSDELQRHKRTHTGEKK 
FACPECPKRFMRSDHLSKHIKTHQNKKG (SEQ. ID. No: 13) 
Sp-1 binds to a target site 5'GGG GCG GGG3' (SEQ ID No: 14). 

An alternate form of Sp-1, an Sp-1 consensus sequence, has the following amino 
acid sequence: 
mekkngsgd 

PGKKKQHACPECGKSFSKSSHLRAHQRTHTGERP 
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YKCPECGKSFSRSDELQRHQRTHTGEKP 

YKCPECGKSFSRSDHLSKHQRTHQNKKG (SEQ. ID. No:15) (lower case letters are a 
leader sequence from Shi & Berg, Chemistry and Biology 1, 83-89. (1995). The optimal 
binding sequence for the Sp-1 consensus sequence is 5'GGGGCGGGG3' (SEQ ID No: 
16) . Other suitable ZFPs are described below. 

There are a number of substitution rules that assist rational design of some zinc 
finger proteins (see Desjarlais & Berg, PNAS 90, 2256-2260 (1993); Choo & Klug, PNAS 
91, 11163-11167 (1994); Desjarlais & Berg, PiV^^ 89, 7345-7349 (1992); Jamieson et 
al., supra; Choo et al,, WO 98/53057, WO 98/53058; WO 98/53059; WO 98/53060). 
Many of these rules are supported by site-directed mutagenesis of the three-finger domain 
of the ubiquitous transcription factor, Sp-1 (Desjarlais and Berg, 1992; 1993). One of 
these rules is that a 5' G in a DNA triplet can be bound by a zinc finger incorporating 
arginine at position 6 of the recognition helix. Another substitution rule is that a G in the 
middle of a subsite can be recognized by including a histidine residue at position 3 of a 
zinc finger. A fiirther substitution rule is that asparagine can be incorporated to recognize 
A in the middle of triplet, aspartic acid, glutamic acid, serine or threonine can be 
incorporated to recognize C in the middle of triplet, and amino acids with small side 
chains such as alanine can be incorporated to recognize T in the middle of triplet. A 
further substitution rule is that the 3' base of triplet subsite can be recognized by 
incorporating the following amino acids at position -1 of the recognition heUx: arginine 
to recognize G, glutamine to recognize A, glutamic acid (or aspartic acid) to recognize C, 
and threonine to recognize T. Although these substitution rules are usefiil in designing 
zinc finger proteins they do not take into account all possible target sites. Furthermore, 
the assumption underlying the rules, namely that a particular amino acid in a zinc finger 
is responsible for binding to a particular base in a subsite is only approximate. Context- 
dependent interactions between proximate amino acids in a finger or binding of multiple 
amino acids to a single base or vice versa can cause variation of the binding specificities 
predicted by the existing substitution rules. 

The technique of phage display provides a largely empirical means of generating 
zinc finger proteins with a desired target specificity (see e.g., Rebar, US 5,789,538; Choo 
et al., WO 96/06166; Barbas et al, WO 95/19431 and WO 98/543111; Jamieson et al., 
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supra). The method can be used in conjunction with, or as an alternative to rational 
design. The method involves the generation of diverse libraries of mutagenized zinc 
finger proteins, followed by the isolation of proteins with desired DNA-binding 
properties using affinity selection methods. To use this method, the experimenter 
typically proceeds as follows. First, a gene for a zinc finger protein is mutagenized to 
introduce diversity into regions important for binding specificity and/or affinity. In a 
typical application, this is accompUshed via randomization of a single finger at positions 
-1, +2, +3, and +6, and sometimes accessory positions such as +1, +5, +8 and +10. Next, 
the mutagenized gene is cloned into a phage or phagemid vector as a fiision with gene III 
of a filamentous phage, which encodes the coat protein pIII. The zinc finger gene is 
inserted between segments of gene III encoding the membrane export signal peptide and 
the remainder of pIII, so that the zinc finger protein is expressed as an amino-terminal 
fiision with pIII or in the mature, processed protein. When using phagemid vectors, the 
mutagenized zinc finger gene may also be fused to a truncated version of gene III 
encoding, minimally, the C-terminal region required for assembly of pIII into the phage 
particle. The resultant vector library is transformed into E. coli and used to produce 
filamentous phage which express variant zinc finger proteins on their surface as fusions 
with the coat protein pIII. If a phagemid vector is used, then the this step requires 
superinfection with helper phage. The phage library is then incubated with target DNA 
site, and affinity selection methods are used to isolate phage which bind target with high 
affinity fi*om bulk phage. Typically, the DNA target is immobilized on a solid support, 
which is then washed under conditions sufficient to remove all but the tightest binding 
phage. After washing, any phage remaining on the support are recovered via elution 
under conditions which disrupt zinc finger - DNA binding. Recovered phage are used to 
infect firesh E. coli., which is then amplified and used to produce a new batch of phage 
particles. Selection and amplification are then repeated as many times as is necessary to 
enrich the phage pool for tight binders such that these may be identified using sequencing 
and/or screening methods. Although the method is illustrated for pIII fusions, analogous 
principles can be used to screen ZFP variants as pVIII fusions. 

In certain embodiments, the sequence bound by a particular zinc finger protein is 
determined by conducting binding reactions (see, e.g., conditions for determination of K<j, 
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infra) between the protein and a pool of randomized double-stranded oligonucleotide 
sequences. The binding reaction is analyzed by an electrophoretic mobility shift assay 
(EMSA), in which protein-DNA complexes undergo retarded migration in a gel and can 
be separated from unbound nucleic acid. Oligonucleotides which have bound the finger 
are purified from the gel and amplified, for example, by a polymerase chain reaction. 
The selection {i.e. binding reaction and EMS A analysis) is then repeated as many times 
as desired, with the selected oligonucleotide sequences. In this way, the binding 
specificity of a zinc finger protein having a particular amino acid sequence is determined. 

Zinc finger proteins are often expressed with a heterologous domain as fusion 
proteins. Common domains for addition to the ZFP include, e.g., transcription factor 
domains (activators, repressors, co- activators, co-repressors), silencers, oncogenes (e.g., 
myc, jun, fos, myb, max, mad, rel, ets, bcl, myb, mos family members etc.); DNA repair 
enzymes and their associated factors and modifiers; DNA rearrangement enzymes and 
their associated factors and modifiers; chromatin associated proteins and their modifiers 
(e.g. kinases, acetylases and deacetylases); and DNA modifying enzymes (e.g., 
methyltransferases, topoisomerases, helicases, Ugases, kinases, phosphatases, 
polymerases, endonucleases) and their associated factors and modifiers. A preferred 
domain for fusing with a ZFP when the ZFP is to be used for represssing expression of a 
target gene is a KRAB repression domain from the human KOX-1 protein (Thiesen et al., 
New Biologist 2, 363-374 (1990); Margolin et al, Proc. Natl. Acad. ScL USA 91, 4509- 
4513 (1994); Pengue et al., Nucl Acids Res. 22:2908-2914 (1994); Witzgall et al., Proc. 
Natl. Acad. ScL USA 91, 4514-4518 (1994). Preferred domains for achieving activation 
include the HSV VP16 activation domain (see, e.g., Hagmann et al, J. Virol. 71, 5952- 
5962 (1997)) nuclear hormone receptors (see, e.g., Torchia et al., Curr. Opin. Cell Biol 
10:373-383 (1998)); the p65 subunit of nuclear factor kappa B (Bitko & Barik, J. Virol 
72:5610-5618 (1998)and Doyle & Hunt, Neuroreport 8:2937-2942 (1997)); Liu et al., 
Cancer Gene Ther. 5:3-28 (1998)), or artificial chimeric functional domains such as 
VP64 (Seifpal et al., EMBO 7. 1 1, 4961-4968 (1992)). 

An important factor in the administration of polypeptide compounds, such as the 
ZFPs, is ensuring that the pol3^eptide has the abihty to traverse the plasma membrane of 
a cell, or the membrane of an intra-cellular compartment such as the nucleus. Cellular 
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membranes are composed of lipid-protein bilayers that are freely permeable to small, 
nonionic lipophilic compounds and are inherently impermeable to polar compounds, 
macromolecules, and therapeutic or diagnostic agents. However, proteins and other 
compounds such as liposomes have been described, which have the ability to translocate 
polypeptides such as ZFPs across a cell membrane. 

For example, "membrane translocation polypeptides" have amphiphilic or 
hydrophobic amino acid subsequences that have the ability to act as membrane- 
translocating carriers. In one embodiment, homeodomain proteins have the ability to 
translocate across cell membranes. The shortest intemaUzable peptide of a homeodomain 
protein, Antennapedia, was found to be the third helix of the protein, from amino acid 
position 43 to 58 (see, e,g., Prochiantz, Current Opinion in Neurobiology 6:629-634 
(1996)). Another subsequence, the h (hydrophobic) domain of signal peptides, was found 
to have similar cell membrane translocation characteristics (see, e.g., Lin et al, J, Biol 
Chem. 270:1 4255-14258 (1995)). 

Examples of peptide sequences which can be linked to a ZFP, for facilitating 
uptake of ZFP into cells, include, but are not limited to: an 1 1 amino acid peptide of the 
tat protein of HIV; a 20 residue peptide sequence which corresponds to amino acids 84- 
103 of the pl6 protein (see Fahraeus et al, Current Biology 6:84 (1996)); the third helix 
of the 60-amino acid long homeodomain of Antennapedia (Derossi et al, J. Biol Chem. 
269:10444 (1994)); the h region of a signal peptide such as the Kaposi fibroblast growth 
factor (K-FGF) h region (Lin et al, supra); or the VP22 translocation domain from HSV 
(Elliot & O'Hare, Cell 88:223-233 (1997)). Other suitable chemical moieties that 
provide enhanced cellular uptake may also be chemically linked to ZFPs. 

Toxin molecules also have the abihty to transport polypeptides across cell 
membranes. Often, such molecules are composed of at least two parts (called "binary 
toxins"): a translocation or binding domain or polypeptide and a separate toxin domain 
or polypeptide. Typically, the translocation domain or polypeptide binds to a cellular 
receptor, and then the toxin is transported into the cell. Several bacterial toxins, 
including Clostridium perfringens iota toxin, diphtheria toxin (DT), Pseudomonas 
exotoxin A (PE), pertussis toxin (PT), Bacillus anthracis toxin, and pertussis adenylate 
cyclase (CYA), have been used in attempts to deliver peptides to the cell cytosol as 
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internal or amino-terminal fusions (Arora et al, J. Biol Chem., 268:3334-3341 (1993); 
Perelle et al., Infect. Immun., 61:5147-5156(1993); Stenmark a/., J. Cell Biol 
113:1025-1032 (1991); Donnelly etal, /'A^^S 90:3530-3534 (1993); Carbonetti et al, 
Abstr. Annu. Meet. Am. Soc. Microbiol 95:295 (1995); Sebo etal, Infect, Immun, 
63:3851-3857 (1995); Klimpel etal, PNAS U.S.A. 89:10277-10281 (1992); and Novak et 
al, J. Biol Chem. 267:17186-17193 1992)). 

Such subsequences can be used to translocate ZFPs across a cell membrane. 
ZFPs can be conveniently fused to or derivatized with such sequences. Typically, the 
translocation sequence is provided as part of a fusion protein. Optionally, a linker can be 
used to link the ZFP and the translocation sequence. Any suitable linker can be used, 
e.g., a peptide linker. 

in. Position Dependence Of Subsite Recognition By Zinc Fingers 

A number of the polypeptides disclosed herein have been characterized using the 
methods disclosed in parent apphcation Serial No. 09/716,637 (the disclosure of which is 
hereby incorporated by reference in its entirety); in particular with respect to the effect of 
their position, within a multi-finger protein, on their sequence specificity. The results of 
these investigations provide a set of zinc finger sequences that are optimized for 
recognition of certain triplet target subsites whose 5 '-most nucleotide is a G (i.e., GNN 
triplet subsites). Thus, particular zinc finger sequences which recognize each of the GNN 
triplet subsites, from each position of a three-finger zinc finger protein, are provided. See 
Figure 2. It will be clear to those of skill in the art that the optimized, position-specific 
zinc finger sequences disclosed herein for recognition of GNN target subsites are not 
limited to use in three- finger proteins. For example, they are also useful in six-finger 
proteins, which can be made by linkage of two three-fmger proteins. 

A number of zinc finger amino acid sequences which are reported to bind to target 
subsites in which the 5 '-most nucleotide residue iaG (/.e., GNN subsites) have recently 
been disclosed. Segal et al (1999) Proc. Natl Acad. Sci. USA 96:2758-2763; Drier et 
al (2000) J. Mol Biol 303:489-502; U.S. Patent No. 6,140,081. These GNN-binding 
zinc fingers were obtained by selection of finger 2 sequences from phage display libraries 
of three-finger proteins, in which certain amino acid residues of finger 2 had been 
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randomized. Due to the manner in which they were selected, it is not clear whether these 
sequences would have the same target subsite specificity if they were present in the Fl 
and/or F3 positions. 

Use of the methods and compositions disclosed herein has now allowed 
identification of specific zinc finger sequences that bind each of the 16 GNN triplet 
subsites, and for the first time, provides zinc finger sequences that are optimized for 
recognition of these triplet subsites in a position-dependent fashion. Moreover, in vivo 
studies of these optimized designs reveal that the functionality of a ZFP is correlated with 
its binding affinity to its target sequence. See Example 6, infra. 

As a result of the discovery, disclosed herein, that sequence recognition by zinc 
fingers is position-dependent, it is clear that existing design rules will not, in and of 
themselves, be applicable to every situation in which it is necessary to construct a 
sequence-specific ZFP. The results disclosed herein show that many zinc fingers that are 
constructed based on design rules exhibit the sequence specificity predicted by those 
design rules only at certain finger positions. The position-specific zinc fingers disclosed 
herein are likely to function more efficiently in vivo and in cultured cells, with fewer 
nonspecific effects. Highly specific ZFPs, made using position-specific zinc fingers, will 
be useful tools in studying gene function and will find broad applications in areas as 
diverse as human therapeutics and plant engineering. 

IV. Production of Zinc Finger Proteins 

ZFP polypeptides and nucleic acids encoding the same can be made using routine 
techniques in the field of recombinant genetics. Basic texts disclosing the general 
methods include Sambrook et al.. Molecular Cloning, A Laboratory Manual (2nd ed. 
1989); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and 
Current Protocols in Molecular Biology (Ausubel et al., eds., 1994)). In addition, 
nucleic acids less than about 100 bases can be custom ordered from any of a variety of 
commercial sources, such as The Midland Certified Reagent Company 
(mcrc@ohgos.com). The Great American Gene Company (http://www.genco.com), 
ExpressGen Inc. (www.expressgen.com), Operon Technologies Inc. (Alameda, CA). 
Similarly, peptides can be custom ordered fi"om any of a variety of sources, such as 
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PeptidoGenic (pkim@ccnet.com), HTI Bio-products, inc. (http://www.htibio.com), BMA 
Biomedicals Ltd (U.K.), Bio.Synthesis, Inc. 

Oligonucleotides can be chemically synthesized according to the solid phase 
phosphoramidite triester method first described by Beaucage & Caruthers, Tetrahedron 
Letts, 22:1859-1862 (1981), using an automated synthesizer, as described in Van 
Devanter et al. Nucleic Acids Res, 12:6159-6168 (1984). Purification of 
oligonucleotides is by either denaturing polyacrylamide gel electrophoresis or by reverse 
phase HPLC. The sequence of the cloned genes and synthetic oligonucleotides can be 
verified after cloning using, e.g., the chain termination method for sequencing double- 
stranded templates of Wallace et al., Gene 16:21-26 (1981). 

Two alternative methods are typically used to create the coding sequences 
required to express newly designed DNA-binding peptides. One protocol is a PCR-based 
assembly procedure that utihzes six overlapping oligonucleotides (Fig. 1). Three 
oligonucleotides (oligos 1, 3, and 5 in Figure 1) correspond to "universal" sequences that 
encode portions of the DNA-binding domain between the recognition helices. These 
oligonucleotides typically remain constant for all zinc finger constructs. The other three 
"specific" oligonucleotides (oligos 2, 4, and 6 in Fig. 1) are designed to encode the 
recognition helices. These oligonucleotides contain substitutions primarily at positions - 
1, 2, 3 and 6 on the recognition helices making them specific for each of the different 
DNA-binding domains. 

The PGR synthesis is carried out in two steps. First, a double stranded DNA 
template is created by combining the six oligonucleotides (three universal, three specific) 
in a four cycle PGR reaction with a low temperature annealing step, thereby annealing the 
oligonucleotides to form a DNA "scaffold." The gaps in the scaffold are filled in by 
high-fidelity thermostable polymerase, the combination of Taq and Pfii polymerases also 
suffices. In the second phase of construction, the zinc finger template is ampUfied by 
extemal primers designed to incorporate restriction sites at either end for cloning into a 
shuttle vector or directly into an expression vector. 

An altemative method of cloning the newly designed DNA-binding proteins relies 
on annealing complementary oligonucleotides encoding the specific regions of the 
desired ZFP. This particular appUcation requires that the oligonucleotides be 
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phosphorylated prior to the final ligation step. This is usually performed before setting 
up the annealing reactions. In brief, the "universal" oligonucleotides encoding the 
constant regions of the proteins (oligos 1, 2 and 3 of above) are annealed with their 
complementary oligonucleotides. Additionally, the "specific" oligonucleotides encoding 
the finger recognition helices are annealed with their respective complementary 
oUgonucleotides. These complementary oligos are designed to fill in the region which 
was previously filled in by polymerase in the above-mentioned protocol. The 
complementary oligos to the common oligos 1 and fmger 3 are engineered to leave 
overhanging sequences specific for the restriction sites used in cloning into the vector of 
choice in the following step. The second assembly protocol differs from the initial 
protocol in the following aspects: the "scaffold" encoding the newly designed ZFP is 
composed entirely of synthetic DNA thereby eliminating the polymerase fill-in step, 
additionally the fragment to be cloned into the vector does not require amphfication. 
Lastly, the design of leaving sequence-specific overhangs eliminates the need for 
restriction enzyme digests of the inserting fragment. Alternatively, changes to ZFP 
recognition helices can be created using conventional site-directed mutagenesis methods. 

Both assembly methods require that the resulting fragment encoding the newly 
designed ZFP be ligated into a vector. Ultimately, the ZFP-encoding sequence is cloned 
into an expression vector. Expression vectors that are commonly utilized include, but are 
not hmited to, a modified pMAL-c2 bacterial expression vector (New England BioLabs 
or an eukaryotic expression vector, pcDNA (Promega). The final constructs are verified 
by sequence analysis. 

Any suitable method of protein purification known to those of skill in the art can 
be used to purify ZFPs (see, Ausubel, supra, Sambrook, supra). In addition, any suitable 
host can be used for expression, e.g., bacterial cells, insect cells, yeast cells, mammalian 
cells, and the like. 

Expression of a zinc finger protein fused to a maltose binding protein (MBP-ZFP) 
in bacterial strain JM109 allows for straightforward purification through an amy lose 
column (NEB). High expression levels of the zinc finger chimeric protein can be 
obtained by induction with IPTG since the MBP-ZFP fusion in the pMal-c2 expression 
plasmid is under the control of the tac promoter (NEB). Bacteria containing the MBP- 
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ZFP fusion plasmids are inoculated into 2xYT medium containing 10|xM ZnC12, 0.02% 
glucose, plus 50 |ng/ml ampicillin and shaken at At mid-exponential growth IPTG 
is added to 0.3 mM and the cultures are allowed to shake. After 3 hours the bacteria are 
harvested by centrifagation, disrupted by sonication or by passage through a french 
pressure cell or through the use of lysozyme, and insoluble material is removed by 
centrifugation. The MBP-ZFP proteins are captured on an amylose-bound resin, washed 
extensively with buffer containing 20 mM Tris-HCl (pH 7.5), 200 mM NaCl, 5 mM DTT 
and 50 ^M ZnC12 , then eluted with maltose in essentially the same buffer (purification is 
based on a standard protocol from NEB). Purified proteins are quantitated and stored for 
biochemical analysis. 

The dissociation constants of the purified proteins, e.g., Kd, are typically 
characterized via electrophoretic mobility shift assays (EMSA) (Buratowski & Chodosh, 
in Current Protocols in Molecular Biology 12.2.1-12.2.7 (Ausubel ed., 1996)). 
Affinity is measured by titrating purified protein against a fixed amount of labeled 
double-stranded oligonucleotide target. The target typically comprises the natural 
binding site sequence flanked by the 3 bp found in the natural sequence and additional, 
constant flanking sequences. The natural binding site is typically 9 bp for a three-finger 
protein and 2 x 9 bp + intervening bases for a six finger ZFP. The aimealed 
ohgonucleotide targets possess a 1 base 5' overhang which allows for efficient labeling 
of the target with T4 phage polynucleotide kinase. For the assay the target is added at a 
concentration of 1 nM or lower (the actual concentration is kept at least 10-fold lower 
than the expected dissociation constant), purified ZFPs are added at various 
concentrations, and the reaction is allowed to equilibrate for at least 45 min. In addition 
the reaction mixture also contains 10 mM Tris (pH 7.5), 100 mM KCl, 1 mM MgC12, 0.1 
mM ZnC12, 5 mM DTT, 10% glycerol, 0.02% BSA. (NB: in earlier assays poly d(IC) 
was also added at 10-100 |xg/|il.) 

The equilibrated reactions are loaded onto a 10% polyacrylamide gel, which has 
been pre-run for 45 min in Tris/glycine buffer, then bound and unbound labeled target is 
resolved by electrophoresis at 150V. (altematively, 10-20% gradient Tris-HCl gels, 
containing a 4% polyacrylamide stacker, can be used) The dried gels are visualized by 
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autoradiography or phosphorimaging and the apparent Kd is determined by calculating 
the protein concentration that gives half-maximal binding. 

The assays can also include determining active fractions in the protein 
preparations. Active fractions are determined by stoichiometric gel shifts where proteins 
are titrated against a high concentration of target DNA. Titrations are done at 100, 50, 
and 25% of target (usually at micromolar levels). 

V. Applications of Engineered Zinc Finger Proteins 

ZPFs that bind to a particular target gene, and the nucleic acids encoding them, 
can be used for a variety of applications. These applications include therapeutic methods 
in which a ZFP or a nucleic acid encoding it is administered to a subject and used to 
modulate the expression of a target gene within the subject. See, for example, co-owned 
WO 00/41566. The modulation can be in the form of repression, for example, when the 
target gene resides in a pathological infecting microrganisms, or in an endogenous gene 
of the patient, such as an oncogene or viral receptor, that is contributing to a disease state. 
Alternatively, the modulation can be in the form of activation when activation of 
expression or increased expression of an endogenous cellular gene can ameliorate a 
diseased state. For such applications, ZFPs, or more typically, nucleic acids encoding 
them are formulated with a pharmaceutically acceptable carrier as a pharmaceutical 
composition. 

Pharmaceutically acceptable carriers are determined in part by the particular 
composition being administered, as well as by the particular method used to administer 

til 

the composition, {see, e.g., Remington's Pharmaceutical Sciences, 17 ed. 1985)). The 
ZFPs, alone or in combination with other suitable components, can be made into aerosol 
formulations (i.e., they can be "nebuhzed") to be administered via inhalation. Aerosol 
formulations can be placed into pressurized acceptable propellants, such as 
dichlorodifluoromethane, propane, nitrogen, and the like. Formulations suitable for 
parenteral administration, such as, for example, by intravenous, intramuscular, 
intradermal, and subcutaneous routes, include aqueous and non-aqueous, isotonic sterile 
injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that 
render the formulation isotonic with the blood of the intended recipient, and aqueous and 
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non-aqueous sterile suspensions that can include suspending agents, solubilizers, 
thickening agents, stabilizers, and preservatives. Compositions can be administered, for 
example, by intravenous infusion, orally, topically, intraperitoneally, intravesically or 
intrathecally. The formulations of compounds can be presented in unit-dose or multi- 
dose sealed containers, such as ampules and vials. Injection solutions and suspensions 
can be prepared from sterile pov^ders, granules, and tablets of the kind previously 
described. 

The dose administered to a patient should be sufficient to effect a beneficial 
therapeutic response in the patient over time. The dose is determined by the efficacy and 
Kd of the particular ZFP employed, the target cell, and the condition of the patient, as 
well as the body v^eight or surface area of the patient to be treated. The size of the dose 
also is determined by the existence, nature, and extent of any adverse side-effects that 
accompany the administration of a particular compound or vector in a particular patient 

In other applications, ZFPs are used in diagnostic methods for sequence specific 
detection of target nucleic acid in a sample. For example, ZFPs can be used to detect 
variant alleles associated with a disease or phenotype in patient samples. As an example, 
ZFPs can be used to detect the presence of particular mRNA species or cDNA in a 
complex mixtures of mRNAs or cDNAs. As a further example, ZFPs can be used to 
quantify copy number of a gene in a sample. For example, detection of loss of one copy 
of a p53 gene in a clinical sample is an indicator of susceptibility to cancer. In a further 
example, ZFPs are used to detect the presence of pathological microorganisms in clinical 
samples. This is achieved by using one or more ZFPs specific to genes within the 
microorganism to be detected. A suitable format for performing diagnostic assays 
employs ZFPs linked to a domain that allows immbbihzation of the ZFP on an ELIS A 
plate. The immobilized ZFP is contacted with a sample suspected of containing a target 
nucleic acid under conditions in which binding can occur. Typically, nucleic acids in the 
sample are labeled (e.g., in the course of PGR amphfication). Altematively, unlabelled 
probes can be detected using a second labelled probe. After washing, bound-labelled 
nucleic acids are detected. 

ZFPs also can be used for assays to determine the phenotype and function of gene 
expression. Current methodologies for determination of gene function rely primarily 
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upon either overexpression or removing (knocking out completely) the gene of interest 
from its natural biological setting and observing the effects. The phenotypic effects 
observed indicate the role of the gene in the biological system. 

One advantage of ZFP-mediated regulation of a gene relative to conventional 
knockout analysis is that expression of the ZFP can be placed under small molecule 
control. By controlling expression levels of the ZFPs, one can in turn control the 
expression levels of a gene regulated by the ZFP to determine what degree of repression 
or stimulation of expression is required to achieve a given phenotypic or biochemical 
effect. This approach has particular value for drug development. By putting the ZFP 
under small molecule control, problems of embryonic lethality and developmental 
compensation can be avoided by switching on the ZFP repressor at a later stage in mouse 
development and observing the effects in the adult animal. Transgenic mice having 
target genes regulated by a ZFP can be produced by integration of the nucleic acid 
encoding the ZFP at any site in trans to the target gene. Accordingly, homologous 
recombination is not required for integration of the nucleic acid. Further, because the 
ZFP is trans-dominant, only one chromosomal copy is needed and therefore functional 
knock-out animals can be produced without backcrossing. 

All references cited above are hereby incorporated by reference in their entirety 
for all purposes. 

EXAMPLES 

Example 1: Initial design of zinc finger proteins and determination of 
binding affinity 

Initial ZFP designs were based on existing design rules, correspondence regimes 
and ZFP directories, including those disclosed herein (see Tables 1-5) and also in 
WO 98/53058; WO 98/530059; WO 98/53060 and co-owned US patent application 
Serial No. 09/444,241. See also WO 00/42219. Amino acid sequences were 
conceptually designed using amino acids 532-624 of the human transcription factor Spl 
as a backbone. Polynucleotides encoding designed ZFPs were assembled using a 
Polymerase Chain Reaction (PCR)-based procedure that utilizes six overlapping 
oligonucleotides. PGR products were directly cloned cloning into the Tac promoter 
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vector, pMal-c2 (New England Biolabs, Beverly, MA) using the Kpnl and BamHI 
restriction sites. The encoded maltose binding protein-ZFP fusion polypeptides were 
purified according to the manufacturer's procedures (New England Biolabs, Beverly, 
MA). Binding affinity was measured by gel mobility-shift analysis. All of these 
procedures are described in detail in co-owned WO 00/41566 and WO 00/42219, as well 
as in Zhang et al (2000) /. Biol Chem. 275:33,850-33,860 and Liu et al (2001) J. Biol 
Chem, 276:1 1,323-1 1,334; the disclosures of which are hereby incorporated by reference 
in their entireties. 

Example 2: Optimization of binding specificity by site selection 

Designed ZFPs were tested for binding specificity using site selection methods 
disclosed in parent application USSN 09/716,637. Briefly, designed proteins were 
incubated with a population of labeled, double-stranded oligonucleotides comprising a 
library of all possible 9- or 10-nucleotide target sequences. Five nanomoles of labeled 
oligonucleotides were incubated with protein, at a protein concentration 4-fold above its 
IQ for its target sequence. The mixture was subjected to gel electrophoresis, and bound 
oligonucleotides were identified by mobility shift, and extracted from the gel. The 
purified bound oligonucleotides were amplified, and the amplification products were 
used for a subsequent round of selection. At each round of selection, the protein 
concentration was decreased by 2 fold. After 3-5 rounds of selection, ampUfication 
products were cloned into the TOPO TA cloning vector (Invitrogen, Carlsbad, CA), and 
the nucleotide sequences of approximately 20 clones were determined. The identities of 
the target sites bound by a designed protein were determined from the sequences and 
expressed as a compilation of subsite binding sequences. 

Example 3: Comparison of site selection results with binding affinity 

To test the correlation between site selection results and the affinity of binding of 
a ZFP to various related targets, site selection experiments were conducted on 2 three- 
finger ZFPs, denoted ZFPl and ZFP2, and the site selection results were compared with 
Kd measurements obtained from quantitative gel-mobility shift assays using the same 
ZFPs and target sites. Each ZFP was constructed, based on design rules, to bind to a 
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particular nine-nucleotide target sequence (comprising 3 three-nucleotide subsites), as 
shown in Figure 1 . Site selection results and affinity measurements are also shown in 
Figure 1. The site selection results showed that fingers 1 and 3 of both the ZFPl and 
ZFP2 proteins preferentially selected their intended target sequences. However, the 
second finger of each ZFP preferentially selected subsites other than those to which they 
were designed to bind (e.g., F2 of ZFPl was designed to bind TCG, but preferentially 
selected GTG; F2 of ZFP2 was designed to bind GOT, but preferentially selected GGA). 

To confirm the site selection results, binding affinities of ZFPl and ZFP2 were 
measured (see Example 1, supra), both to their original target sequences and to new 
target sequences reflecting the site selection results. For example, the Mt-1 sequence 
contains two base changes (compared to the original target sequence for ZFPl) which 
result in a change in the sequence of the finger 2 subsite to GTG, reflecting the preferred 
finger 2 subsite sequence obtained by site selection. In agreement with the site selection 
results, binding of ZFPl to the Mt-1 sequence is approximately 4-fold stronger than its 
binding to the original target sequence (K<j of 12.5 nM compared to a of 50 nM, see 
Figure 1). 

For ZFP2, the specificity of finger 2 for the 3' base of its target subsite was tested, 
since, although this finger was designed to bind GGT, site selection indicated that it 
bound preferentially to GGA. Moreover, the site selection results predicted that finger 2 
of ZFP2 would bind with approximately equal affinity to GGT and GGC. Accordingly, 
target sequences containing GGA (Mt-3) and GGC (Mt-4) at the finger 2 subsite were 
constructed, and binding affinities of ZFP2 to these target sequences, and to its original 
target sequence (containing GGT at the finger 2 subsite), were compared. In complete 
agreement with the site selection results, ZFP2 exhibited the strongest binding affinity for 
the target sequence containing GGA at the finger 2 subsite (K<j of 0.5 nM, Figure 1), and 
its affinity for target sequences containing either GGT or GGC at the finger 2 subsite was 
approximately equal (K<i of 1 nM for both targets, Figure 1). Accordingly, the site 
selection method, in addition to being usefiil for iterative optimization of binding 
specificity, can also be used as a usefiil indicator of binding affinity. 
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Example 4: Use of site selection to identify position-dependent, GNN-binding 
zinc fingers 

A large number of engineered ZFPs have been evaluated, by site selection, to 
identify zinc fingers that bind to GNN target subsites. In the course of these studies, it 
became apparent that the binding specificity of a particular zinc finger sequence is, in 
some instances, dependent upon the position of the zinc finger in the protein, and hence 
upon the location of the target subsite within the target sequence. For example, if one 
wishes to design a three-finger zinc finger protein to bind to a target sequence containing 
the triplet subsite GAT, it is necessary to know whether this subsite is the first, second or 
third subsite in the target sequence (i.e., whether the GAT subsite will be bound by the 
first, second or third finger of the protein). Accordingly, over 1 10 three-finger zinc 
fmger proteins, containing potential GNN-recognizing zinc fingers in various locations, 
have been evaluated by site selection experiments. Generally, several zinc finger 
sequences were designed to recognize each GNN triplet, and each design was tested in 
each of the Fl, F2 and F3 positions through 4 to 6 rounds of selection. 

The results of these analyses, shown in Figure 2, provide optimal position- 
dependent zinc finger sequences (the sequences shown represent amino acid residues -1 
through +6 of the recognition helix portion of the finger) for recognition of the 16 GNN 
target subsites, as well as site selection results for these GNN-specific zinc fingers. 
Optimal amino acid sequences for recognition of each GNN subsite fi-om each of three 
positions (finger 1, finger 2 or fmger 3) are thereby provided. 

GNG-binding finger designs 

The amino acid sequence RSDXLXR (position -1 to +6 of the recognition helix) 
was found to be optimal for binding to the four GNG triplets, with Asn^"^ specifying A as 
the middle nucleotide; His"^^ specifying G as the middle nucleotide; Ala^^ specifying T as 
the middle nucleotide; and Asp specifying cytosine as the middle nucleotide. At the +5 
position, Ala, Thr, Ser, and Gbi, were tested, and all showed similar specificity profiles 
by site selection. Interestingly, and in contrast to a previous report (Swimoff et al (1995) 
Mol Cell. Biol. 15:2275-2287), site selection results indicated that three naturally- 
occurring GCG-binding fingers fi-om zif268 and Spl, having the amino acid sequences 
RSDELTR, RSDELQR, and RSDERKR, were not GCG-specific. Rather, each of these 
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fingers selected almost equal numbers of GCG and GTG sequences. Analysis of binding 
affinity by gel-shift experiments confirmed that finger 3 of zi£268, having the sequence 
RSDERKR, binds GCG and GTG with approximately equal affinity. 

Position dependence of GCA-, GAT-. GGT-. GAA- and GCC-binding fingers 
Based on existing design rules, the amino acid sequence QSGDLTR (-1 through 
+6) was tested for its ability to bind the GCA triplet fi-om three positions (Fl, F2, and F3) 
within a three-finger ZFP. Figure 3A shows that the QSGDLTR sequence bound 
preferentially to the GCA triplet subsite fi*om the F2 and F3 positions, but not fi'om Fl . 
In fact, the presence of QSGDLTR at the Fl position of three different three-fmger ZFPs 

4 

resulted predominantly in selection of GCT. Accordingly, an attempt was made to 
redesign this sequence to obtain specificity for GCA fi"om the Fl position. Since the 
sequence Q" G S R had previously been selected from a randomized Fl library using 
GCA as target (Rebar et al (1994) Science 263:671-673), a D (asp) to S (ser) change was 
made at the +3 residue of this finger. The resulting sequence, QSGSLTR, was tested for 
its binding specificity by site selection and found to preferentially bind GCA, from the Fl 
position, in three different ZFPs (see Figure 2). 

The QSGSLTR zinc finger, optimized for recognition of the GCA subsite from 
the Fl position, was tested for its selectivity when located at the F2 position. 
Accordingly, two ZFPs, one containing QSGSLTR at finger 2 and one containing 
QSGDLTR at finger 2 (both having identical Fl sequences and identical F3 sequences) 
were tested by site selection. The results indicated that, when used at the F2 position, 
QSGSLTR bound preferentially to GTA, rather than GCA. Thus, for optimal binding of 
a GCA triplet subsite from the Fl position, the amino acid sequence QSGSLTR is 
required; while, for optimal binding of the same subsite sequence from F2 or F3, 
QSGDLTR should be used. Accordingly, different zinc finger amino acid sequences may 
be needed to specify a particular triplet subsite sequence, depending upon the location of 
the subsite within the target sequence and, hence, upon the position of the finger in the 
protein. 

Positional effects were also observed for zinc fingers recognizing GAT and GGT 
subsites. The zinc finger amino acid sequence QSSNLAR (-1 through +6) is expected to 
bind to GAT, based on design rules. However, this sequence selected GAT only from the 
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Fl position, and not from the F2 and F3 positions, from which the sequence GAA was 
preferentially bound (Figure 3B). Similarly, the amino acid sequence QSSHLTR which, 
based on design rules, should bind GGT, selected GOT at the Fl position, but not at the 
F2 and F3 positions, from which it preferentially bound GGA (Figure 3C). Conversely, 
the amino acid sequence TSGHLVR has previously been disclosed to recognize the 
triplet GGT, based on its selection from a randomized library of zif268 finger 2. U.S. 
Patent No. 6,140,081. However, TSGHLVR was not specific for the GGT subsite when 
located at the Fl position (Figure 3C). These results indicate that the binding specificity 
of many fingers is position dependent, and particularly point out that the sequence 
specificity of a zinc finger selected from a F2 library may be positionally limited. 

The results shown in Figure 2 indicate that recognition of at least GAA and GCC 
triplets by zinc fingers is also position dependent. 

These positional dependences stand in contrast to earlier published work, which 
suggested that zinc fingers behaved as independent modules with respect to the sequence 
specificity of their binding to DNA. Desjarlais et al (1993) Proc, Natl Acad, Sci. USA 
90:2256-2260. 

Example 5: Characterization of EP2C 

The engineered zinc finger protein EP2C binds to a target sequence, 
GCGGTGGCT with a dissociation constant (K^) of 2 nM. Site selection results indicated 
that fmgers 1 and 2 are highly specific for their target subsites, while finger 3 selects 
GCG (its intended target subsite) and GTG at approximately equal frequencies 
(Figure 4 A). To confirm these observations, the binding affmities of EP2C to its cognate 
target sequence, and to variant target sequences, was measured by standard gel-shift 
analyses (see Example 1, supra). As standards for comparison, the binding affinities of 
Spl and zif268 to their respective targets were also measured under the same conditions, 
and were determined to be 40 nM for SPl (target sequence GGGGCGGGG) and 2 nM 
for zi£268 (target sequence GCGTGGGCG). Measurements of binding affinities 
confirmed that F3 of EP2C bound GTG and GCG equally well (K^s of 2 nM), but bound 
GAG with a two-fold lower affinity (Figure 4B). Finger 2 was very specific for the GTG 
triplet, binding 15-fold less tightly to a GGG triplet (compare 2C0 and 2C3 in Figure 4B). 
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Finger 1 was also very specific for the GCT triplet, it bound with 4-fold lower affinity to 
a GAT triplet (2C4) and with 2-fold lower affinity to a GCG triplet (2C5). This example 
shows, once again, the high degree of correlation between site selection results and 
binding affinities. 

Example 6: Evaluation of engineered ZFPs by in vivo functional assays 

To determine whether a correlation exists between the binding affinity of a 
engineered ZFP to its target sequence and its functionality in vivo, cell-based reporter 
gene assays were used to analyze the functional properties of the engineered ZFP EP2C 
(see Example 5, supra). For these assays, a plasmid encoding the EP2C ZFP, fused to a 
VP 16 transcriptional activation domain, was used to construct a stable cell line (T-Rex- 
293™, Invitrogen, Carlsbad, CA) in which expression of EP2C-VP16 is inducible, as 
described in Zhang et al, supra. To generate reporter constructs, three tandem copies of 
the EP2C target site, or its variants (see Figure 4B, column 2), were inserted between the 
Mlu I and BgUI sites of the pGL3 luciferase-encoding vector (Promega, Madison, WI), 
upstream of the SV40 promoter. Structures of all reporter constructs were confirmed by 
DNA sequencing. 

Luciferase reporter assays were performed by co-transfection of luciferase 
reporter construct (200 ng) and pCMV- Pgal (100 ng, used as an internal control) into the 
EP2C cells seeded in 6-well plates. Expression of the EP2C-VP16 transcriptional 
activator was induced with doxycycline (0.05 ug/ml) 24 h after transfection of reporter 
constructs. Cell lysates were harvested 40 hours post-transfection, luciferase and 
p-galactosidase activities were measured by the Dual-Light Reporter Assay System 
(Tropix, Bedford, MA), and luciferase activities were normalized to the co-transfected P- 
galactosidase activities. The results, shown on the right side of Figure 4B, showed that 
the normalized luciferase activity for each reporter construct was well correlated with the 
in vitro binding affinity of EP2C to the target sequence present in the construct. For 
example, the target sequences to which EP2C bound with greatest affinity (2C0 and 2C2, 
K<i of 2 nM for each) both stimulated the highest levels of luciferase activity, when used 
to drive luciferase expression in the reporter construct (Figure 4B). Target sequences to 
which EP2C bound with 2-fold lower affinity, 2C1 and 2C5 (K<j of 4 nM for each), 
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stimulated roughly half the luciferase activity of the 2C0 and 2C2 targets. The 2C3 and 
2C4 sequences, for which EP2C showed the lowest in vitro binding affinities, also 
yielded the lowest levels of in vivo activity when used to drive luciferase expression. 
Target 3B, a sequence to which EP2C does not bind, yielded background levels of 
luciferase activity, similar to those obtained with a luciferase-encoding vector lacking 
EP2C target sequences (pGL3). Thus there exist good correlations between binding 
affinity (as determined by K<j measurement), binding specificity (as determined by site 
selection) and in vivo fimctionality for engineered zinc finger proteins. 
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ft; 


374 


GCTGAGGAA 


52 


- iTli 

^LtTi. -J ' 


375 


GAGGAAGAT 


53 


i; c 


401 


GTAGTTGTG 


54 


J" ^! ;* 


403 


GTAGTTGTG 


55 


I» 


421 


GTAGTTGTG 


56 




422 


GTAGTTGTG 


57 




423 


GTAGTTGTG 


58 




424 


GATGCTGAG 


59 




425 


GATGCTGAG 


60 




42 6 


GATGCTGAG 


61 




42 7 


GCTGAGGAA 


62 




/I o 0 
42 0 


GAAGATGAC 


63 




42 9 


GAAGATGAC 


64 




43 0 


GATGACGAC 


65 




431 


GACGACGGC 


66 




432 


GACGACGGC 


67 




433 


GACGACGGC 


68 




434 


GACGGCGTA 


69 




435 


GACGGCGTA 


70 




436 


GACGGCGTA 


71 



QSGSLTR 147 RSDHLTT 
QSGSLTR 14 8 RSDHLTT 
QSGSLTR 14 9 RGDHLKD 
QSGSLTR 150 RGDHLKD 
RSDHLTR 151 DSGHLTR 
RSDELTR 152 RSDHLTR 
RSDALTR 153 TGGSLAR 
RSDALTR 154 NRATLAR 
RSDALTR 155 NRATLAR 
RSDSLLR 156 TGGSLAR 
RSDSLLR 157 NRATLAR 
QRSNLVR 158 RSDNLTR 
QQSNLAR 15 9 QSGNLQR 
RSDALTR 160 TGGSLAR 
RSDSLLR 161 NRATLAR 
DSDSLLR 162 TGGSLAR 
RSDSLLR 163 TGGSLTR 
RSDALTR 164 TGGSLAR 
RSDNLTR 165 TSSELQR 
RSDNLTR 166 QSSDLQR 
RSDNLTR 167 QSSDLQR 
QRSNLVR 168 RSDNLTR 
DSSNLTR 169 QQSNLAR 
DSSNLTR 170 TSANLSR 
EKANLTR 171 DSSNLTR 
DSGHLTR 172 DRSNLER 
DSGHLTR 173 DHANLAR 
DSGNLTR 174 DHANLAR 
QSASLTR 175 DSGHLTR 
QSASLTR 176 DSGHLTR 
QRSALAR 177 DSGHLTR 
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253 


ERQHLAT 


359 


96 


254 


ERDHLRT 


360 


64 


255 


ERQHLAT 


361 


1000 


256 


ERDHLRT 


362 


1000 


257 


RSDHLQR 


363 


60 


258 


RSDNLTR 


364 


3 . 5 


259 


QSGSLTR 


365 


95 


260 


QSASLTR 


366 


300 


261 


QSGSLTR 


367 


175 


262 


QSASLTR 


368 


112 . 5 


263 


QSASLTR 


369 


320 


264 


TSSELQR 


370 


3.3 


265 


RSDNLTR 


371 


85 


266 


QSASLTR 


372 


80 


267 


QSGSLTR 


373 


750 


268 


QSGSLTR 


374 


500 


269 


QSGSLTR 


375 


200 


270 


QRSALAR 


376 


1000 


271 


TSANLSR 


377 


100 


272 


QQSNLAR 


378 


25 


273 


TSANLSR 


379 


5 . 5 


274 


QSSDLQR 


380 


1 


275 


QRSNLVR 


381 


120 


276 


QRSNLVR 


382 


50 


277 


QQSNLAR 


383 


250 


278 


DSSNLTR 


384 


100 


279 


DSSNLTR 


385 


1000 


280 


DSSNLTR 


386 


1000 


281 


EKANLTR 


387 


152 . 5 


282 


ERGNLTR 


388 


150 


283 


EKANLTR 


389 


95 
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437 


GACGGCGTA 72 


QRSALAR 


178 


DSGHLTR 2 84 


ERGNLTR 


390 


117.5 


438 


GAGGGGGCG 73 


RSDELTR 


179 


RSDHLTT 2 85 


RSDNLTR 


391 


62 . 5 


440 


GCCGAGGTGC 74 


RSDSLLR 


180 


RSKNLQR 2 86 


ERGTLAR 


392 


40 


441 


GGTGGAGTCA 75 


^ '4 ^^^^^^ 

DSGSLTR 


181 


QSGHLQR 287 


TSGHLTR 


393 


250 


445 


GTCGCAGTGA 76 


RSDSLRR 


182 


QSSDLQK 2 88 


DSGSLTR 


394 


1000 


450 


GACTTGGTGC 77 


RSDTLAR 


183 


RGDALTS 2 89 


DRSNLTR 


^ 

395 


130 


453 


GGTGGAGTCA 7 8 


DRSALAR 


184 


QSGHLQR 2 90 


X^ X X T X^ 

DSSKLSR 


396 


150 


461 


GAGTACTGTA 79 


QRSHLTT 


185 


DRSNLRT 2 91 


X^ ^^X^TIXT TV X^ 

RSDNLAR 


397 


120 


463 


GTGGAGGAGA 8 0 


RSDNLTR 


186 


RSDNLAR 2 92 


X*% X^ T X^ 

RSDALAR 


398 


0 . 5 


464 


GTGGAGGAGA 81 


RSDNLTR 


187 


RSDNLAR 2 93 


RSDSLAR 


399 


0 . 4 


466 


CAGGCTGCGC 82 


RSDDLTR 


188 


QSSDLQR 2 94 


RSDNLRE 


400 


65 


467 


CAGGCTGCGC 83 


RSDELTR 


189 


QSSDLQR 2 95 


RGDHLKD 


401 


o 

800 


468 


CAGGCTGCGC 84 


RSDDLTR 


190 


QSSDLQR 2 96 


RGDHLKD 


402 


42 


469 


GAAGAGGTCT 85 


DRSALAR 


F^ F^ 

191 


RSDNLAR 2 97 


QSGNLTR 4 03 


13 . 5 


472 


GAGGTCTGGA 86 


RSSHLTT 


192 


DRSALAR 2 98 


RSDNLAR 


404 


80 


476 


GGAGAGGATG 87 


TTSNLRR 


193 


RSDNLAR 2 99 


QSDHLTR 


405 


80 


477 


GGAGAGGATG 8 8 


TTSNLRR 


194 


RSDNLAR 3 0 0 

'I 


QRAHLAR 


406 


100 


478 


GGAGAGGATG 8 9 


TTSNLRR 


195 


RSDNLAR 3 01 


QSGHLRR 4 07 


^ y^ 

60 


479 


GTGGCGGACC 90 


DSSNLTR 


196 


RSDELQR 3 02 


RSDALAR 


408 


8 . 5 


480 


GTGGCGGACC 91 


DSSNLTR 


197 


RADTLRR 3 03 


RSDALAR 


409 


5 


483 


GAGGGCGAAG 92 


QSANLAR 


198 


ESSKLKR 3 04 


RSDNLAR 


410 


130 


484 


GAGGGCGAAG 93 


QSDNLAR 


199 


ESSKLKR 3 05 


RSDNLAR 


411 


1000 


485 


GGAGAGGTTT 94 


QSSALAR 


200 


RSDNLAR 3 06 


QRAHLAR 412 


110 


487 


GGAGAGGTTT 95 


NRATLAR 


201 


RSDNLAR 3 07 


QSGHLAR 


413 


76 . 9 


488 


TGGTAGGGGG 96 


RSDHLAR 


202 


RSDNLTT 3 0 8 


RSDHLTT 


414 


35 


490 


TAGGGGGTGG 97 


RSDSLLR 


203 


RSDHLTR 3 09 


RSDNLTT 


415 


1 . 5 


503 


GCCGAGGTGC 98 


RSDSLLR 


204 


RSDNLAR 310 


ERGTLAR 


416 


50 


504 


GCCGAGGTGC 99 


RSDSLLR 


205 


RSDNLAR 311 


DRSDLTR 


417 


25 


505 


GCCGAGGTGC 100 


RSDSLLR 


206 


RSDNLAR 312 


DCRDLAR 


418 


65 


526 


GCGGGCGGGC 101 


RSDHLTR 


207 


ERGHLTR 313 


RSDTLKK 


419 


8 


543 


GAGTGTGTGA 102 


RSDLLQR 


208 


MSHHLKE 314 


RSDHLSR 


420 


50 
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544 


GAGTGTGTGA 


103 


RSDSLLR 


209 


iiyf*~iTTXxx x^xn 

MSHHLKE 


315 


X*^ ^~\.Tk XT' T\ T^ 

RSDNLAR 


421 


125 


545 


GAGTGTGTGA 


104 


RKDSLVR 


210 


m/~ix~\Txx TV /~i 

TSDHLAS 


316 


X^ ^*lT^TtTT mX^ 

RSDNLTR 


422 


y^ 

32 


546 


GAGTGTGTGA 


105 


RSDLLQR 


211 


MSHHLKT 


317 


RLDGLRT 


423 


r— y-y /"v 

500 


547 


GAGTGTGTGA 


106 


RKDSLVR 


212 


TSGHLTS 


318 


/^1*VTtXT 111 X^ 

RSDNLTR 


424 


500 


548 


GAGTGTGTGA 


107 


RSSLLQR 


213 


iv/r/~ixTTXx x^rxi 

MSHHLKT 


319 


RSDHLSR 


425 


i— yv 

500 


549 


Tt y-*i III 111 y** tv 

GAGTGTGTGA 


108 


X^ 1^ X X /^X^ 

RSSLLQR 


y^ 1 yi 

214 


Tl/r/^XTTXX T^xn 

MSHHLKE 


320 


x^ x^T TX t~y x^ 

RSDHLSR 


426 


i— yv 

500 


550 


GAGTGTGTGA 


109 


RKDSLVR 


215 


TKDHLAS 


321 


RSDNLTR 


427 


20 


551 


GAGTGTGTGA 


110 


x^ r~ix^x X /^x^ 

RSDLLQR 


216 


TiJf/^TTXTT X^m 

MSHHLKT 


322 


X^ i^X~\XXT X^ 

RSDHLSR 


Jl <^ 

428 


50 


552 


GAGTGTGTGA 


■1 "1 
111 


X^ T^X*\ y^ X T TT~* 

RKDSLVR 


217 


TV /"I T TT X X T^m 

MSHHLKT 


323 


X^ /^X%TtXT 111 X^ 

RSDNLTR 


Jl ^> ^> 

429 


31 


553 


GAGTGTGTGA 


112 


x^ ^ x~\ /~i X X x^ 

RSDSLLR 


218 


H JT X XX XT X ^ 1 1 

MSHHLKE 


324 


X^ X^ Tl X X I T 1 

RSDNLTR 


430 


125 


554 


GAGTGTGTGA 


113 


T— \ T^x*\ (~1 X T TX^ 

RKDSLVR 


219 


rn/~«X^TTT TV 

TSDHLAS 


325 


T — \ /"IX^TVTX TV X^ 

RSDNLAR 


431 


y^ ^ 

62 . 5 


558 


TGCGGGGCA 


■1 "1 /I 
114 


QSGDLTR 


r> r\ 

220 


RSDHLTR 


326 


X^ X X T H 

DSGHLAS 


432 


21 


559 


GAGTGTGTGA 


115 


X^ T — \ (~| -r X T — ^ 

RSDSLLR 


221 


mnx^xTx TV /~i 

TSDHLAS 


327 


T— \ /— 1 XMVXX TV X^ 

RSDNLAR 


yl 

433 


1000 


560 


GAGTGTGTGA 


116 


X^ (~\ /~1 X X /^X^ 

RSSLLQR 


222 


MSHHLKT 


1^ 1^ 
328 


X^ X^T TX /™l T-\ 

RSDHLSR 


434 


500 


561 


GAGTGTGTGA 


117 


RKDSLVR 


/-^ 1^ ^ 

223 


*jr/^xxxxT T.^^^ 

MSHHLKE 


329 


RSDNLAR 


435 


1000 


562 


GAGTGTGTGA 


118 


RSDSLLR 


224 


If! X XT* lit 

TSGHLTS 


330 


RSDNLAR 


436 


1000 


565 


GATGCTGAG 


119 


RSDNLTR 


225 


TSSELQR 


331 


QQSNLAR 


437 


100 


567 


GAAGATGAC 


120 


EKANLTR 


226 


TSANLSR 


332 


QRSNLVR 


438 


47.5 


568 


GATGACGAC 


121 


EKANLTR 


227 


DSSNLTR 


333 


TSANLSR 


439 


300 


569 


GTAGTTGTG 


122 


RSDSLLR 


228 


TGGSLAR 


334 


QRSALTR 


440 


52 
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'Hi:? 



e ?. " 







SEQ 




bbQ 








bEQ 




CT504I' 


TARGET 


T "Pi 


Fl 


T 

IJJ 








X JJ 


V niYl ; 


201 


GCAGCCTTG 


A A -y 

441 


RSDSLTS 


646 


ERSILIR 


o bl 


QRADLRR 


1 n IT /!r 

1056 


T A A A 

1000 


202 


GCAGCCTTG 


442 


RSDSLTS 


647 


ERb ILIK 


o c o 

obz 


TV T^T TV n 

QRADLAR 


-1 r" <— 1 

1 0 57 


1 A A A 

10 00 


203 


GCAGCCTTG 


443 


RSDSLTS 


648 


ERb ILlK 


o cr o 
ob3 


QRAILRR 


T A C O 

lObo 


T A A A 

10 0 0 


2 04 


GCAGCCTTG 


444 


RSDSLTS 


64 y 


ERb ILlK 


o yi 
o b4 


/^"D TV TIT A "D 

QKAiLAR 


1 u b y 


1 A A A 

10 0 0 


2 05 


GAGGTAGAA 


44 5 


O TV "KTT 7V "D 

QbANLAK 


^ cr r\ 


QbAlLAK 


Q C C 
ODD 


KbDJMLbK 


T r\ ^ A 
1 U D U 


O A 
O U 


206 


TV /~1/~imTV /"^ TV TV 

GAGGTAGAA 


446 


✓-\ f-i TV "NTT TV 

QSANLAR 


651 


O TV T 7T TV TD 

QbAVLAK 


O Zi c 
ODD 


KbDJNLbK 


10 61 


T A A A 
10 U 0 


O T 

207 


GAGTGGTTA 


A A ^ 

44 7 


/^T~l TV O T TV O 

QRAbLAb 




■OOTVLTT nPT' 

KbDrlLl 1 


Q C "7 

ob / 


T3 OT^TvTT TV TD 

KbDJNLAK 


lU d2 


■"7 A 


208 


TAGGTCTTA 


44 8 


QRASLAb 


653 


DRb ALAK 


O CT O 

obo 


■D O "rMVTT TV C 

KbDJNLAb 


1 U 63 


1 A A A 
10 0 0 


r> f\ n 

209 


GGAGTGGTT 


A A C\ 

44 9 


f~\ O TV T TV n 

QSSALAR 


654 


■DCniTVT TV 

RbDALAK 


O CT Q 

oby 


^"D TV "LIT TV "D 

QKAHLAK 


10 64 


3 b 


210 


GGAGTGGTT 


450 


NRDTLAR 


655 


nCT^TVT TV "D 

RbDALAR 


o ^ ri 
O 6 0 


TV T_TT TV "D 

QRAHLAR 


1 A z^" IT 

10 6b 


6 b 


211 


GGAGTGGTT 


451 


f~\ /"I O TV T TV T~) 

QSSALAR 


656 


n O T~> 7\ T TV O 

RSDALAS 


861 


/^Tl TV TTT TV T^ 

QRAHLAR 


1066 


1 yi A 

14 0 


212 


/~i TV /~im/~i mm 

GGAGTGGTT 


452 


TVTTlT^rnT TV n 

NRDTLAR 


657 


n O TV T TV O 

RSDALAS 


8 62 


f~\T'i TV TJT TV T^ 

QRAHLAR 


10 67 


/I A A 

4 0 0 


213 


GTTGCTGGA 


453 


TV TTT TV "O 

QRAHLAR 


658 


QSSTLAR 


8 63 


r\ O O 7\ T TV T~i 

QbbALAR 


T A ^ O 

10 68 


1 A A A 

1000 


214 


GTTGCTGGA 


454 


QRAHLAR 


659 


QSSTLAR 


864 


NRDTLAR 


1069 


-1 f\ f\ r\ 

1000 


215 


GAAGTCTGT 


455 


NRDHLMV 


660 


T~\T^ TV T TV T~* 

DRSALAR 


865 


QSANLSR 


1070 


1000 


216 


GAAGTCTGT 


456 


NRDHLTT 


661 


T^T^ "TV T TV 

DRSALAR 


866 


QSANLSR 


1071 


1000 


217 


Ti 111 I 'M "A 

GAGGTCGTA 


457 


r*1 TV T TV T~* 

QRSALAR 


/— /— <^ 
662 


T^'n n TV T TV "n 

DRSALAR 


(-\ y— 1— 7 

867 


OT*^XTT T\ 

RSDNLAR 


1072 


4 0 


219 


TV m/^mm/~iTV m 

GATGTTGAT 


458 


QQSNLAR 


663 


TVT"nT~\mT TV 

NRDTLAR 


868 


TVTT^ T~MVTT T"^ 

NRDNLSR 


1073 


1000 


220 


/~i TV m/~imm/^TV m 

GATGTTGAT 


459 


QQSNLAR 


^ ^ A 

664 


ivTT~n~\nnT tv t*i 

NRDTLAR 


869 


QQSNLSR 


1074 


1000 


221 


GATGAGTAC 


460 


DRSNLRT 


665 


RSDNLAR 


870 


NRDNLAR 


1075 


1000 


222 


GATGAGTAC 


461 


ERSNLRT 


666 


RSDNLAR 


871 


NRDNLAR 


1076 


1000 


223 


GATGAGTAC 


462 


DRSNLRT 


667 


RSDNLAR 


872 


QQSNLAR 


1077 


105 


224 


GATGAGTAC 


463 


ERSNLRT 


668 


RSDNLAR 


873 


QQSNLAR 


1078 


1000 


225 


TGGGAGGTC 


464 


DRSALAR 


669 


RSDNLAR 


874 


RSDHLTT 


1079 


6 


226 


GCAGCCTTG 


465 


RGDALTS 


670 


ERGTLAR 


875 


QSGSLTR 


1080 


1000 


227 


GCAGCCTTG 


466 


RGDALTV 


671 


ERGTLAR 


876 


QSGSLTR 


1081 


1000 
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228 


GCAGCCTTG 


467 


RGDALTM 


672 


ERGTLAR 


877 


QSGSLTR 


1082 


1000 


229 


GCAGCCTTG 


468 


RGDALTS 


673 


ERGTLAR 


878 


RSDELTR 


1083 


1000 


230 


GCAGCCTTG 


469 


RGDALTV 


674 


ERGTLAR 


879 


RSDELTR 


1084 


1000 


231 


GCAGCCTTG 


470 


RGDALTM 


675 


ERGTLAR 


880 


RSDELTR 


1085 


1000 


232 


GGTGTGGTG 


471 


RSDALTR 


676 


RSDALAR 


881 


NRSHLAR 


1086 


50 


233 


GGTGTGGTG 


472 


RSDALTR 


677 


RSDALAR 


882 


Q ASHLAR 


1087 


100 


235 


GTAGAGGTG 


473 


RSDALTR 


678 


RSDNLAR 


883 


QRGALAR 


1088 


80 


236 


GGGGAGGGG 


474 


RSDHLAR 


679 


RSDNLAR 


884 


RSDHLSR 


1089 


0 . 3 


237 


GGGGAGGCC 


475 


ERGTLAR 


680 


RSDNLAR 


885 


RSDHLSR 


1090 


0 . 3 


238 


GGGGAGGCC 


476 


ERGTLAR 


681 


RSDNLQR 


886 


RSDHLSR 


1091 


0 . 8 


239 


GGCGGGGAG 


477 


RSDNLTR 


682 


RSDHLTR 


887 


DRSHLAR 


1092 


0 . 4 


240 


GCAGGGGAG 


478 


RSDNLTR 


683 


RSDHLSR 


888 


QSGSLTR 


1093 


1 


242 


GGGGGTGCT 


479 


QSSDLRR 


684 


QSSHLAR 


889 


RSDHLSR 


1094 


1 


243 


GTGGGCGCT 


480 


QSSDLRR 


685 


DRSHLAR 


890 


RSDALAR 


1095 


75 


244 


TAAGAAGGG 


481 


RSDHLAR 


686 


QSGNLTR 


891 


QSGNLRT 


1096 


100 


245 


TAAGAAGGG 


482 


RSDHLAR 


687 


QSANLTR 


892 


QSGNLRT 


1097 


235 


246 


GAAGGGGAG 


483 


RSDNLAR 


688 


RSDHLAR 


893 


QSGNLTR 


1098 


2 


247 


GAAGGGGAG 


484 


RSDNLAR 


689 


RSDHLAR 


894 


QSGNLRR 


1099 


2 


276 


GCGGCCGCG 


485 


RSDELTR 


690 


ERGTLAR 


895 


RSDERKR 


1100 


90 


277 


GCGGCCGCG 


486 


RSDELTR 


691 


DRSSLTR 


896 


RSDERKR 


1101 


107 


278 


GCGGCCGCG 


487 


QSWELTR 


692 


ERGTLAR 


897 


RSDERKR 


1102 


190 


279 


GCGGCCGCG 


488 


QSWELTR 


693 


DRSSLTR 


898 


RSDERKR 


1103 


260 


280 


GCGGCCGCG 


489 


QSGSLTR 


694 


ERGTLAR 


899 


RSDERKR 


1104 


160 


281 


GCGGCCGCG 


490 


QSGSLTR 


695 


DRSSLTR 


900 


RSDERKR 


1105 


225 


282 


GCAGAAGTG 


491 


RGDALTR 


696 


QSANLTR 


901 


QSADLAR 


1106 


1000 


283 


GCAGAAGTG 


492 


RSDALTR 


697 


QSGNLTR 


902 


QSGSLTR 


1107 


2 


284 


GCGGCCGCG 


493 


QSGSLTR 


698 


RSDHLTT 


903 


RSDERKR 


1108 


1000 


285 


TGTGCGGCC 


494 


ERGTLAR 


699 


RSDELTR 


904 


SRDHLQS 


1109 


1000 


287 


GCAGAAGCG 


495 


RGPDLAR 


700 


QSANLTR 


905 


QSGSLTR 


1110 


1000 


288 


GCAGAAGCG 


496 


RGPDLAR 


701 


QSANLTR 


906 


QSGSLTR 


1111 


1000 


289 


GCAGAAGCG 


497 


RGPDLAR 


702 


QSGNLQR 


907 


QSGSLTR 


1112 


800 
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o n r\ 




498 


RSDELjAR 


'-7 pi -) 

703 


QSANLQR 


A A 0 

908 


("I TV X>T TV n 

QSADLAR 


•1 T T A 

1113 


1 

1000 


292 


GCAGAAGCG 


499 


RSDELTR 


704 


QSANLQR 


C\ f\ 

909 


QSGSLTR 


'1 1 >i 

1114 


1000 


293 


GTGTGCGGC 


500 


DRSHLTR 


'~i r\ r~ 

705 


ERHSLQT 


910 


X^ "T — \ TV X n 1 T — V 

RSDALTR 


1115 


320 


296 


TGCGCGGCC 


501 


ERGTLAR 


706 


RSDELTR 


A •! "1 

911 


X^X^X^TTX /~\ 

DRDHLQS 


111 

1116 


1000 


297 


TGCGCGGCC 


502 


TTl'0/~<rTlT TV n 

ERGTLiAR 


•"T pV "T 

707 


Tl OXM~IT T^ 

RSDELRR 


A "1 0 

912 


DRSHLQT 


"111 *~j 
1117 


500 


o o o 




cr n o 


y iGELlKK 


n A 0 
/ Uo 


KbDJNlLyK 


A T O 

913 


TbGDLSR 


I 1 T A 

1118 


4000 


o r\ o 

2 99 


GCTTAGGCA 


504 


QrSDLiRR 


'"7 A 0 

709 


RSDNLQK 


A 1 A 

914 


/^/^OT~\T /^T~V 

QSSDLQR 


1119 


4000 


o A r\ 


GL 1 lAGGLA 


b U b 


y lADLiKK 


""7 T A 
/ 10 


RSDNLQK 


915 


/-\ 0 (-1 xA X on 

QSSDLSR 


112 0 


400 


3 01 


GCTTAGGCA 


rz r\ 

506 


QSADLRR 


711 


RSDNLQT 


AT 

916 


QSSDLSR 


1121 


350 


3 02 


GCTTAGGCA 


507 


QSGSLiTR 


712 


RSDNLQT 


917 


QSSDLSR 


1122 


75 


303 


GCTTAGGCA 


y— r\ c> 

508 


QTGSLTR 


713 


X^/^X^'KTX /^rxi 

RSDNLQT 


918 


QSSDLSR 


1123 


135 


304 


GCTTAGGCA 


r~ r\ r\ 

509 


QTADLTR 


714 


X^ x^TVXX y^rxi 

RSDNLQT 


919 


QSSDLSR 


1124 


y^ #^ /-V 

230 


305 


GCTTAGGCA 


510 


QTGDLTR 


715 


RSDNLQT 


920 


QSSDLSR 


1125 


230 


3 06 


GCTTAGGCA 


511 


QTASLTR 


716 


7— V 0 X*\T^TT /^m 

RSDNLQT 


921 


QSSDLSR 


1126 


280 


O O T 

3 07 


GAAGAAGCG 


512 


RSDELRR 


717 


€^ TV TX /~VX^ 

QSGNLQR 


922 


QSGNLSR 


1127 


50 . 5 


308 


GAAGAAGCG 


513 


RSDELRR 


718 


("1 TV 1VTX X*S 

QSANLQR 


923 


QSANLQR 


1128 


1000 


309 


GGAGATGCC 


514 


ERSDLRR 


719 


QSSNLQR 


924 


QSGHLSR 


1129 


4000 


310 


/~1 /"I TV /~1 TV m/*^ 

GGAGATGCC 


515 


DRSDLTR 


(—7 /-^ /-V 

720 


NRDNLQT 


t~\ ^ 
925 


^*\. TXT T^ 

QSGHLSR 


1130 


1000 


311 


/ — 1 /-i TV /"I TV m/^ 

GGAGATGCC 


516 


DRSTLTR 


721 


NRDNLQR 


926 


y^ TXT 

QSGHLSR 


1131 


170 


3 12 


/"I /f-i TV /"I TV m/^ /~1 /-I 

GGAGATGCC 


r— -1 (—7 

517 


T— 1 X") /"I mT TV T^ 

ERGTLAR 


722 


IVTT^ X\TV TX y^T^ 

NRDNLQR 


927 


QSGHLSR 


1132 


y-\ y^ y^ 

2000 


313 


GGAGATGCC 


1 — in 

518 


DRSDLTR 


r-7 <-\ »-> 
723 


QRSNLQR 


928 


QSGHLSR 


1133 


ofl y^ y^ 

1000 


314 


GGAGATGCC 


519 


x^x^ *^ *^ X m T~\ 

DRSSLTR 


724 


QSSNLQR 


929 


QSGHLSR 


1134 


117 , 5 


315 


GGAGATGCC 


520 


■I— 1 T~V rXI X TV TV 

ERGTLAR 


725 


QSSNLQR 


930 


QSGHLSR 


1135 


265 


316 


GGAGATGCC 


521 


xnT^/~imix TV x^ 

ERGTLAR 


726 


QRDNLQR 


931 


QSGHLSR 


1136 


3000 


318 


TAGGAGATGC 522 


RSDALTS 


•"T <^ (—7 

727 


\ /"IX^lVTX TV X^ 

RSDNLAR 


932 


RSDNLAS 


1137 


•«« y»»fc 

100 


319 


GGGGAAGGG 


523 


T>^mf^TTT *0 TV 

KTSHLRA 


•"T /-\ 

728 


QSGNLSR 


933 


T^ T^X XT 

RSDHLSR 


1138 


125 


o o n 

32 0 


GGGGAAGGG 


524 


Tvm~\TTT mx^ 

RSDHLTR 


729 


QSGNLSR 


934 


RSDHLSR 


1139 


5 


321 


GGCGGAGAT 


525 


TTSNLRR 


730 


QSGHLQR 


935 


DRSHLTR 


1140 


200 


323 


GGCGGAGAT 


526 


TTSNLRR 


731 


QSGHLQR 


936 


DRDHLTR 


1141 


600 


324 


GGCGGAGAT 


527 


TTSNLRR 


732 


QSGHLQR 


937 


DRDHLTR 


1142 


200 


325 


GTATCTGCT 


528 


NSSDLTR 


733 


NSDVLTS 


938 


QSDVLTR 


1143 


1000 



43 



8325-00011.20 
S11-US2 



326 


GTATCTGTT 


529 


NSDALTR 


734 


NSDVLTS 


/-\ /-» 
939 


QSDVLTR 


1 "1 yi yi 

1144 


•1 r\ f\ 

1000 


327 


TCTGCTGGG 


530 


RSDHLTR 


735 


NSADLTR 


Ji 

940 


NSDDLTR 


1145 


1000 


328 


TCTGTTGGG 


531 


RSDHLTR 


736 


NS SALTS 


941 


NSDDLTR 


1146 


w« y^ y^ 

1000 


349 


GGTGTCGCC 


532 


DCRDLAR 


737 


DSGSLTR 


942 


TSGHLTR 


1147 


1000 


350 


TCCGAGGGT 


533 


TSGHLTR 


738 


■ 1 V T f 1 till T X 

RSDNLTR 


943 


DCRDLTT 


1148 


332 


351 


GCTGGTGTC 


534 


DSGSLTR 


739 


TSGHLTR 


A A 

944 


TLHTLTR 


*^ 'T VI 

1149 


•t v^ ^\ y^ 

1000 


352 


GGAGGGGTG 


535 




740 


RSDHLTR 


945 


QSDHLTR 


1150 


26 


353 


GTTGGAGCC 


^ 

536 


DCRDLAR 


^T VI 1 

741 


QSDHLTR 


946 


r 1 1 >— f TV T III 

TSGALTR 


1151 


1000 


354 


GAAGAGGAC 


537 


DSSNLTR 


742 


RSDNLTR 


VI ^T 

947 


QRSNLVR 


1152 


28 


355 


GAAGAGGAC 


538 


EKANLTR 


743 


RSDNLTR 


948 


QRSNLVR 


1153 


y^ 

20 


356 


GGCTGGGCG 


539 


RSDELRR 


^T A Jt 

744 


RSDHLTK 


949 


» >v j»*« T X T T x^ 

DSDHLSR 


1 1 r~ VI 

1154 


1000 


357 


GGCTGGGCG 


540 


RSDELRR 


745 


RSDHLTK 


950 


X^ X X T x^ 

DSDHLSR 


1155 


«^ yv yx 

1000 


358 


GGCTGGGCG 


541 


RSDELRR 


746 


RSDHLTK 


951 


DSSHLSR 


1156 


225 


361 


GGGTTTGGG 


542 


RSDHLTR 




QSSALTR 


952 


RSDHLTR 


1157 


130 


363 


GGGTTTGGG 


543 


RSDHLTR 


748 


QSSVLTR 


953 


RSDHLTR 


1158 


200 


364 


GTGTCCGAAG 


544 


RSDNLTR 


749 


DSAVLTT 


JB 

954 


RSDSLTR 


1159 


1000 


365 


GGTGCTGGT 


545 


QASHLTR 


750 


QASVLTR 


955 


QASHLTR 


1160 


600 


366 


GAGGGTGCT 


546 


QASVLTR 


751 


QASHLTR 


956 


RSDNLTR 


1161 


1000 


367 


GGGGGCGGG 


547 


RSDHLTR 


752 


DSGHLTR 


957 


RSDHLQR 


1162 


60 


368 


GAGGGGGCG 


548 


RSDELTR 


753 


RSDHLTR 


958 


RSDNLTR 


1163 


3 . 5 


369 


GTAGTTGTG 


549 


RSDALTR 


1 1 Ji 

754 


TGGSLAR 


959 


QSGSLTR 


1164 


95 


370 


GTAGTTGTG 


550 


RSDALTR 


755 


NRATLAR 


960 


>^ TV T* I'n x^ 

QSASLTR 


1165 


300 


371 


GTAGTTGTG 


551 


1 \ 1 S *lt T~ 111 1 K 

RSDALTR 


(T p— 

756 


TL 1 ■ \ TV 111 T* TV 1^ 

NRATLAR 


961 


QSGSLTR 


1166 


175 


372 


GTAGTTGTG 


552 


RSDSLLR 


757 


TGGSLAR 


962 


QSASLTR 


1167 


^lik * *^ 


373 


GTAGTTGTG 


553 


RSDSLLR 


758 


NRATLAR 


963 


QSASLTR 


1168 


320 


374 


GCTGAGGAA 


554 


QRSNLVR 


759 


RSDNLTR 


964 


TSSELQR 


1169 


3 . 3 


375 


GAGGAAGAT 


555 


QQSNLAR 


760 


QSGNLQR 


965 


^ V^F FV^ 

RSDNLTR 


1170 


85 


377 


GTGTTGGCAG 


556 


QSGSLTR 


761 


RGDALTS 


966 


RSDALTR 


1171 


89 


378 


GCCGAGGAGA 


557 


RSDNLTR 


762 


RSDNLTR 


967 


DRSSLTR 


1172 


31 


379 


GCCGAGGAGA 


558 


RSDNLTR 


763 


RSDNLTR 


968 


ERGTLAR 


1173 


3 


380 


GAGTCGGAAG 


559 


QSANLAR 


764 


RSDELTT 


969 


RSDNLAR 


1174 


1000 
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381 


GCAGCTGCGC 


560 


RSDELTR 765 


QSSDLQR 970 


QSGDLTR 1175 


1 r~ 

1 . 5 




383 


TGGTTGGTAT 


561 


QSATLAR 76 6 


RGDALTS 971 


RSDHLTT 1176 


1000 




384 


GTGGGCTTCA 


562 


DRSALTT 767 


DRSHLAR 972 


RSDALAR 1177 


60 




385 


GGGGCGGAGC 


563 


RSDNLTR 768 


RSDTLKK 973 


RSDHLSR 1178 


1 . 2 




386 


GGGGCGGAGC 


564 


RSDNLTR 769 


RSDELQR 974 


RSDHLSR 1179 


0 . 4 




387 


GGCGAGGCAA 


565 


QSGSLTR 770 


RSDNLAR 975 


DRSHLAR 1180 


2 . 5 




388 


GGCGAGGCAA 


566 


QSGDLTR 771 


RSDNLAR 976 


DRSHLAR 1181 


28 




390 


GTGGCAGCGG 


567 


RSDTLKK 772 


QSSDLQK 977 


RSDALAR 1182 


2 0 




392 


GTGGCAGCGG 


568 


RSDELTR 773 


QSSDLQK 978 


RSDALAR 1183 


1000 




396 


GCGGGAGCAG 


569 


QSGSLTR 774 


QSGHLQR 97 9 


RSDTLKK 1184 


18 . 8 




397 


GCGGGAGCAG 


570 


QSGDLTR 775 


QSGHLQR 980 


RSDTLKK 1185 


25 




400 


TCAGTGGTGG 


571 


RSDALAR 776 


RSDSLAR 981 


QSGDLRT 1186 


4 0 




405 


GCGGCCGCA 


572 


RSDELTR 777 


ERGTLAR 982 


RSDERKR 1187 


110 




406 


GCGGCCGCA 


573 


RSDELTR 778 


DRSSLTR 983 


RSDERKR 1188 


1 "1 A 

110 


>f 


407 


GCGGCCGCA 


574 


QSWELTR 779 


ERGTLAR 984 


RSDERKR 1189 


410 




408 


GCGGCCGCA 


575 


QSWELTR 780 


DRSSLTR 985 


RSDERKR 1190 


380 




409 


GCGGCCGCA 


576 


QSGSLTR 781 


ERGTLAR 986 


RSDERKR 1191 


50 




410 


GCAGAAGTC 


577 


RSDALTR 782 


QSGNLTR 987 


QSGSLTR 1192 


3 




411 


GCGGCCGCA 


578 


QSGSLTR 783 


RSDHLTT 988 


RSDERKR 1193 


1000 




412 


GCGTGGGCG 


579 


QSGSLTR 784 


RSDHLTT 989 


RSDERKR 1194 


5 


J ^ 

w 


413 


GCGTGGGCA 


580 


QSGSLTR 785 


RSDHLTT 990 


RSDERKR 1195 


5 




414 


GCAGAAGCA 


581 


RSDELTR 786 


QSANLQR 991 


QSGSLTR 1196 


1000 




415 


GTGTGCGGA 


582 


DRSHLTR 787 


ERHSLQT 992 


RSDALTR 1197 


1000 




416 


TGTGCGGCC 


583 


ERGTLAR 7 8 8 


RSDELRR 993 


DRSHLQT 1198 


1000 




493 


GGGGTGGCGG 584 


RSDTLKK 789 


RSDSLAR 994 


RSDHLSR 119 9 


300 




494 


GCCGAGGAGA 585 


RSDNLTR 7 90 


RSDNLTR 995 


DRSSLTR 12 00 


90 




496 


GGTGGTGGC 


586 


DTSHLRR 7 91 


TSGHLQR 996 


TSGHLSR 12 01 


1000 




497 


GTTTGCGTC 


587 


ETASLRR 7 92 


DSAHLQR 997 


TSSALSR 1202 


1000 




498 


GAAGAGGCA 


588 


QTGELRR 793 


RSDNLQR 998 


QSGNLSR 12 03 


30 




499 


GCTTGTGAG 


589 


RTSNLRR 7 94 


TSSHLQK 999 


DTDHLRR 12 04 


1000 




500 


GCTTGTGAG 


590 


RSDNLTR 7 95 


QSSNLQT 100 0 


DRSHLAR 1205 


1000 
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501 


GTGGGGGTT 


591 


NRATIiAR 


796 


RSDHLSR 


1001 


T— \ i~\ T->i TV T TV X^ 

RSDALAR 


■1 /~\ r-\ 

1206 


8 


502 


GGGGTGGGA 


592 


QSAHLAR 


y\ ^T 

797 


y* ■» T TV ^~v 

RSDALAR 


«^ /-V y\ 

1002 


T*\ X^ TXT X*> 

RSDHLSR 


1207 


60 


507 


GAGGTAGAGG 


593 


RSDNLAR 


r — 1 yv /-\ 

798 


yv yi n T* TV 1^ 

QRSALAR 


1003 


yr t-^1l tx tv t^ 

RSDNLAR 


— » y\ yv 

1208 


10 


508 


GAGGTAGAGG 


l— yv j% 

594 


RSDNLAR 


m yv 

799 


yv y* TV 111 T- Tl 

QSATLAR 


1004 


X^ XT TV T^ 

RSDNLAR 


•T yv yv 

1209 


10 


509 


^^^U B^BH .^^d .^^d K^BH .^Vd -MW .riPVd .^^ll .rf'Sd 

GTCGTGTGGC 


595 


RSDHLTT 


yi yv yfc 

800 


RSDALAR 


1005 


■ ^ 1 V y ' 1 Tl ^p Th 

DRSALAR 


*4 y^ *4 ^*v 

1210 


^ y^ ^*v 

100 


510 


GTTGAGGAAG 


596 


yv #^ y^ "K XT Tfi 1^ 

QSGNLAR 


y\ yv ^ 

801 


T^TL "XT TV 

RSDNLAR 


•1 y" 

1006 


-fc tX^ TV n 't T Tl T^ 

NRATLAR 


1211 


100 


511 


GTTGAGGAAG 


y^ 1 > 

597 


yv y* y^ Tl T\ v 

QSGNLAR 


y\ yv yi 

802 


f V y»t I V ^ TT" TV f X 

RSDNLAR 


1007 


y«*V yi y^ Tt -1- TV 

QSSALAR 


<p« y^ ««« y^ 

1212 


*4 yv yv 

100 


512 


GAGGTGGAAG 


598 


QSGNLAR 


yv yv 

803 


1 TV T" TV X^ 

RSDALAR 


*1 y> yv 

1008 


X^ ^T 1 % "Ik. T T TV 1 V 

RSDNLAR 


1213 


10 


513 


GAGGTGGAAG 


599 


y*X /*! T\ TL TT "TV 

QSANLAR 


yv >l 

804 


^v y* ^v TV T TV ^> 

RSDALAR 


y\ yv 

1009 


V y* T~x^ XT TV 

RSDNLAR 


1214 


1 . 5 


514 


TAGGTGGTGG 


y" y-v y\ 

600 


RSDALTR 


yv y\ ^ 

805 


TV T^ Tl T*^ 

RSDALAR 


1010 


T^ T^TL XT rxtrxT 

RSDNLTT 


1215 


10 


515 


TGGGAGGAGT 


601 


RSDNLTR 


806 


RSDNLTR 


1011 


RSDHLTT 


1216 


y"\ 1 — 

0 . 5 


516 


GGAGGAGCT 


602 


TTSELRR 


yv yv 

807 


yv yi yn TXT y^ 

QSGHLQR 


1012 


yv yt y»i t XT /"I ^\ 

QSGHLSR 


1217 


700 


517 


GGAGCTGGGG 


603 


RTDHLRR 


808 


TSSELQR 


1013 


QSGHLSR 


1218 


50 


518 


GGGGGAGGAG 


604 


QTGHLRR 


c\ r\ c\ 

809 


QSGHLQR 


1014 


RSDHLSR 


1219 


30 


519 


y-H y^ y*»i y-*i y^ y^ Tfc yw "tv 

GGGGAGGAGA 


605 


RSDNLAR 


810 


RSDNLSR 


1015 


X^ T^T X X X^ 

RSDHLSR 


1220 


0 . 3 


520 


GGAGGAGAT 


y» y\ y" 

606 


TTANLRR 


yL 44 «4 

811 


yv yi yd -p X X" ✓"v -ff^ 

QSGHLQR 


xi y\ #4 y- 

1016 


yv yi y*( x XX T^ 

QSGHLSR 


1221 


*^ yv yv 

300 


521 


GCAGCAGGA 


607 


QTGHLRR 


812 


QSGELQR 


1017 


QSGELSR 


1222 


1000 


522 


GATGAGGCA 


608 


QTGELRR 


813 


RSDNLQR 


*4 yv ^ yv 

1018 


PT^ > ' f Th ^p^p ^T 

TSANLSR 


1223 


200 


527 


y^ > * y^ J* ■ ^ y^ yM y^ 

GGGGAGGATC 


yv y^ 

609 


TTSNLRR 


y\ «4 ji 

814 


f i^V y^ y^ TT y*N. T N 

RSSNLQR 


«^ ^v i#« y^ 

1019 


1 V y^ T XT y^ 

RSDHLSR 


1224 


y^ 

2 


528 


GGGGAGGATC 


y^ *i yN 

610 


TTSNLRR 


yv 1 

815 


x^ IL XX* y*v x^ 

RSSNLQR 


1020 


T^ /*^T^XXT y^ ^™v 

RSDHLSR 


*1 y^ y\ 

1225 


10 


529 


y*^ ~P% y^ y*s ys« ■ ■ ■ ■ ■ i y^ y^ y^ 

GAGGCTTGGG 


611 


RTDHLRK 


yv y— 

816 


rx^ TV T y^ 

TSAELQR 


1021 


T^ ^T /^TLXT T^ 

RSSNLSR 


«4 yv yv y" 

1226 


1 yv yv 

1000 


531 


GCGGAGGCTT 


612 


TTGELRR 


817 


RSSNLQR 


1022 


RSDELSR 


1227 


160 


532 


y ij y^ y>^ y^ y^ y^ y** rr^rrn 

GCGGAGGCTT 


613 


yv y< y* yv 

QSSDLQR 


y\ •«« yv 

818 


RSSNLQR 


1023 


T^ '1" V ■ T* y^ 

RSDELSR 


1 #^ yv yv 

1228 


100 


r"-* 

533 


y^ y^ y»id y^ ^ y^ ys* y^ f ■ « « ■ * 

GCGGAGGCTT 


614 


y^ yt y^ y*»k ■ ^ 

QSSDLQR 


y^ ^ y\ 

819 


RSDNLAR 


1 yv y*v Jt 

1024 


%"iv y*^ n tw V y^ ■ 

RSADLSR 


1229 


7 


534 


y^ y^ y*«« y^ y^ y^ y^ i i n 111 

GCGGAGGCTT 


615 


yv y* y* t y^ 

QSSDLQR 


yv yn y\ 

820 


X^ y*l T^N X XT TV 

RSDNLAR 


yv yv 

1025 


RSDDLRR 


4 y\ yv. 

1230 


4 yv 

10 


535 


GCAGCCGGG 


^ y^ 

616 


RTDHLRR 


821 


T — 1 T— V -f- y*N x^ 

ESSDLQR 


1026 


/~\ /~l T — 1 T y~i x~» 

QSGELSR 


1231 


1000 


538 


GCAGAGGCTT 


y^ 1 

617 


QSSDLQR 


822 


RSDNLAR 


1027 


QSGSLTR 


— * yv *^ y> 

1232 


^T 

70 


540 


TGGGCAGGCC 


618 


DRSHLTR 


823 


QSGSLTR 


1028 


RSDHLTT 


1233 


55 


541 


GGGGAGGAT 


619 


TTSNLRR 


824 


RSSNLQR 


1029 


RSDHLSR 


1234 


3 


570 


GGGGAAGGCT 


620 


DSGHLTR 


825 


QRSNLVR 


1030 


RSDHLTR 


1235 


20 


571 


GTGTGTGTGT 


621 


RSDSLTR 


826 


QRSNLVR 


1031 


RSDSLLR 


1236 


1000 
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1 — '"TO 

572 


GCATACGTGG 62 2 


RSDSLLR 82 7 


DKGNLQS 


1032 


y^r^x*vXNX rxn~v 

QSDDLTR 


1 O O 

1237 


•1 f\ f\ r\ 

1000 


573 


GCATACGTG 623 


RSDSLLR 82 8 


DKGNLQS 


1033 


QSGDLTR 


1238 


1000 


1 — >— 7 /I 

574 


TACGTGGGGT 624 


RSDHLTR 82 9 


RSDHLTR 


1034 


DKGNLQT 


1239 


25 


575 


TACGTGGGGT 625 


DFSHLTR 83 0 


RSDHLTR 


1035 


DKGNLQT 


1240 


M ^-f y^ 

472 


576 


GAGGGTGTTG 626 


NSDTLAR 831 


y^ TXT ^\ 

TSGHLTR 


1036 


RSDNLTR 


1241 


y^ yik y^ 

200 


577 


GGAGCGGGGA 62 7 


RSDHLSR 832 


RSDELQR 


1037 


X T T 111 T^ 

QSDHLTR 


1242 


y'^V yK y^ 

200 


579 


GGGGTTGAGG 62 8 


X^ 1 STL XT til |"\ ^\ 

RSDNLTR 83 3 


NRDTLAR 


1038 


TSGHLTR 


1243 


y^ y^ 

200 


580 


GGTGTTGGAG 62 9 


y^n TV TTx TV n r\ A 

QRAHLAR 834 


NRDTLAR 


1039 


TSGHLTR 


1244 


1000 


581 


TACGTGGGTT 63 0 


QSSHLTR 83 5 


RSDSLLR 


1040 


DKGNLQT 


1 y^ yi ^ 

1245 


382 


583 


GTAGGGGTTG 631 


NSSALTR 83 6 


RSDHLTR 


1041 


y~\ Ti T 111 T~\ 

QSASLTR 


1 ^X VI 

1246 


46 


584 


GAAGGCGGAG 632 


TV /"iTTT rm~v O 

QAGHLTR 83 7 


T~\T^/~1TTX mX^ 

DKSHLTR 


1042 


y^ y^ ^ XT r 1 1 X*\ 

QSGNLTR 


1247 


•«« yx yv 

1000 


585 


GAAGGCGGAG 633 


QAGHLTR 83 8 


T^/^^^XXT t 1 IT^ 

DSGHLTR 


1043 


y~v y"»i T "T" 111 ff K 

QSGNLTR 


1248 


*4 ^x ^x yN 

1000 


587 


GGGGGTTACG 634 


X^ X ^✓^TL XT 111 

DKGNLQT 839 


rxi^^^^xxT r^nx^ 

TSGHLTR 


1044 


RSDHLSK 


1249 


500 


588 


y~^ y-^* y^ y^ y^ y^ y^ y-" *^ i— 

GGGGGGGGGG 63 5 


X^ X^X X X X^ vi /*\ 

RSDHLSR 84 0 


RSDHLTR 


1045 


f \ y*»i ■ "fc T" T T T ^ 

RSDHLSK 


1250 


^x yx 

30 


589 


y^ "7\ m Ti m y~i /~t m y^ 

GGAGTATGCT 63 6 


T~\^^y^TTX TV *~t *^ yi 1 

DSGHLAS 841 


y^ /"I TV mx TV x^ 

QSATLAR 


1046 


QSDHLTR 


«4 #^ l~~ *4 

1251 


^ ^x yx 

1000 


595 


TGGTTGGTAT 63 7 


QRGSLAR 842 


RGDALTR 


1047 


X^ T — \ T T T 1 1 1 m 

RSDHLTT 


1252 


73 . 3 


597 


m y~i m m y^ m TV ^ '~\ 

TGGTTGGTA 63 8 


QNSAMRK 843 


RGDALTS 


1048 


x^ /^x^xxT m 

RSDHLTT 


1253 


1000 


598 


TGGTTGGTA 63 9 


QRGSLAR 844 


RDGSLTS 


1049 


RSDHLTT 


1254 


1000 


599 


111 y^ y^ f Tl y*^ y"*t y" j| ^\ 

TGGTTGGTA 64 0 


QNSAMRK 845 


f T "V y^^ y*^ ^ f 

RDGSLTS 


1050 


T^ T^ V T "i^ III III 

RSDHLTT 


1255 


1000 


600 


y*»^ Tfc y— »d III y-n y— 1* y— 1# y~ ji 

GAGTCGGAA. 641 


y^ TV "(l XX TV X> f~\ JK y^ 

QSANLAR 846 


RSDELRT 


1051 


RSDNLAR 


1256 


206.7 


601 


GAGTCGGAA 642 


RSANLTR 847 


RLDGLRT 


1052 


RSDNLAR 


1257 


606,7 


602 


GAGTCGGAA 643 


RSANLTR 848 


RQDTLVG 


1053 


RSDNLAR 


1258 


616.7 


603 


GAGTCGGAA 644 


QSGNLAR 84 9 


RSDELRT 


1054 


RSDNLAR 


1259 


166.7 


606 


GGGGAGGATC 64 5 


TTSNLRR 850 


RSDNLQR 


1055 


RSDHLSR 


1260 


0.2 



fit 
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TABLES 









SEQ 




SEQ 




SEQ 




SEO 


KQ. 




i-> O Tr 


TARGET 


ID 


Fl 


ID 


F2 


ID 


F3 


ID 






f~i 

897 


GAGGAGGTGA 


1261 


T*V ("1 T~\ TV T TV T~l 

RSDALAR 


1347 


n O T^TVTX TV T~l 

RSDNLAR 


1433 


X^ n X^TVTX T TX^ 

RSDNLVR 


1519 


0.07 




828 


GCGGAGGACC 


12 62 


EKANLTR 


1348 


RSDNLAR 


1434 


n o x^ T~i T~> T>n~v 

RSDERKR 


1520 


0 . 1 




884 


GAGGAGGTGA 


1263 


RSDSLTR 


1349 


OTMVTX TV "O 

RSDNLAR 


1435 


RSDNLVR 


1521 


0.15 




817 


TV /~1 /~t TV /~t/^m/~1TV 

GAGGAGGTGA 


1264 


o ("1 T rm~v 

RSDSLTR 


1350 


n OXMVTT TV n 

RSDNLAR 


1436 


TV n TMVTT TV T~» 

RSDNLAR 


1522 


rv o 1 

0.31 




666 


GCGGAGGCGC 


1265 


RSDDLTR 


1351 


RSDNLTR 


T yl "-T 

143 7 


T-\ (~l x^rxiX J7"X^ 

RSDTLKK 


1523 


0 . 5 




O '"N 

829 


GCGGAGGACC 


1266 


EKANLTR 


1352 


RSDNLAR 


143 8 


T107~\rnT T^TT" 

RSDTLKK 


1524 


0 - 52 




670 


GACGTGGAGG 


•1 O y^ ""7 

1267 


RSDNLAR 


1353 


T— \ y-i T— \ TV X TV T~4 

RSDALAR 


143 9 


X^X^ OTVTX 1 1 1 1 ^ 

DRSNLTR 


1 r~ o 1 — 

1525 


0.57 




801 


TV TV /~1 TV m /~1 

AAGGAGTCGC 


12 68 


RSADLRT 


1354 


n OTMVTX TV T^ 

RSDNLAR 


144 0 


T^ O TMVTT ^T^/~\ 

RSDNLTQ 


1526 


rv o rr 

0.85 


ffp ? ST 

^JJ^I- 


668 


GTGGAGGCCA 


12 69 


T— 1 /~1 rn T TV "O 

ERGTLAR 


1355 


n OTMVTT TV T> 

RSDNLAR 


1441 


n OT^TV T TV n 

RSDALAR 


1527 


1-13 




(*4 r~ 

895 


ATGGATTCAG 


12 70 


QSHDLTK 


1356 


rn (~» /TKTT T TT^ 

TSGNLVR 


1442 


Tl OT^TV X rrn/^ 

RSDALTQ 


1528 


1 . 4 


pens 


799 


GGGGGAGCTG 


1271 


QSSDLQR 


1357 


Ti TTX T — 1 

QRAHLER 


1443 


X^ /"I X^T TX i~l X^ 

RSDHLSR 


1529 


1-85 


EE 


798 


GGGGGAGCTG 


1272 


QSSDLQR 


1358 


QSGHLQR 


-1 ^ ^ ^ 
1444 


RSDHLSR 


1530 


3 


'' i iiV # 


842 


y^ TV y^ m /~i y~i m 

GAGGTGGGCT 


1273 


DRSHLTR 


1359 


O X*\ TV T Tl X> 

RSDALAR 


1445 


RSDNLAR 


1531 


5 - 4 




894 


m /"^ TV y^my^/imTV m 

TCAGTGGTAT 


1274 


y-1 TV X TV n 

QRSALAR 


1360 


n t~y T~\ TV X T> 

RSDALSR 


•1 >i >i y" 

1446 


QSHDLTK 


1532 


6 . 15 


S z 


892 


ATGGATTCAG 


1275 


QSHDLTK 


1361 


QQSNLVR 


1447 


n (~i x^ TV X rxi/^ 

RSDALTQ 


1533 


6 . 2 


n 


888 


TCAGTGGTAT 


1276 


QSSSLVR 


1362 


RSDALSR 


1448 


QSHDLTK 


1534 


14 




739 


GCGGGCGGGC 


"1 y^ i—j r-f 

1277 


RSDHLTR 


1363 


T~ix^/~iTTX mx^ 

ERGHLTR 


1449 


*~IX^X^X X*l X^ 

RSDDLRR 


1535 


16 . 5 




850 


y^ TV y^ y^ m /~1 m /™1 

CAGGCTGTGG 


1278 


i~v n TV X rrrn 

RSDALTR 


1364 


y^oox^x mx^ 

QSSDLTR 


1450 


RSDNLRE 


1536 


17 




797 


y^ y^ TV y^ TV /^/~1r~irn/~l 

GCAGAGGCTG 


1279 


QSSDLQR 


1365 


RSDNLAR 


T ^ 1 — T 

1451 


/^0/^x>x rxix^ 

QSGDLTR 


1 r~ ""7 

1537 


17 . 5 




891 


TCAGTGGTAT 


1280 


QSSSLVR 


1366 


RSDALSR 


1452 


QSGSLRT 


1538 


18 . 5 




887 


TCAGTGGTAT 


1281 


QRSALAR 


1367 


RSDALSR 


1453 


QSGDLRT 


1539 


23 . 75 




672 


TCGGACGTGG 


1282 


RSDALAR 


1368 


DRSNLTR 


1454 


RSDELRT 


1540 


24 




836 


GGGGAGGCCC 


1283 


ERGTLAR 


1369 


RSDNLAR 


1455 


RSDHLSR 


1541 


24 .25 




674 


GCGGCGTCGG 


1284 


RSDELRT 


1370 


RADTLRR 


1456 


RSDTLKK 


1542 


27 . 5 




849 


GGGGCCCTGG 


1285 


RSDALRE 


1371 


DRSSLTR 


1457 


RSDHLTQ 


1543 


29 . 05 




825 


GAATGGGCAG 


1286 


QSGSLTR 


1372 


RSDHLTT 


1458 


QSGNLTR 


1544 


37 . 3 
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r-J 

673 


GCGGGTGTCT 12 87 


T^n TV T 7\ n 

DRSAIjAR 


-1 o "-J o 
137 J 


r\C^ OtJT TV TD 

QbbHLAK 


T >t cr Q 

14 b y 


KbDiLKK 


1545 


48.33 


848 


GGGGAGGCCC 12 88 


DRSSLTR 


1374 


n OT\"KTT TV 1"^ 

RSDNLAR 


T >I ^ PV 

14 60 


"O OT^TTT prn 

RSDHLSR 


1546 


49.5 


662 


AGAGCGGCAC 12 89 


QTGSLTR 


13 75 


RbDhLQK 


14 6 1 


QbCjHLNQ 


1547 


50 


667 


GAGTCGGACG 12 90 


"T^T^ piTvTT rm~v 

DRSNLTR 


1376 


RSDELRT 


1462 


n T~MVTT TV n 

RSDNLAR 


1548 


50 


803 


GCAGCGGCTC 12 91 


QSSDLQR 


±311 


RSDELQR 


14 63 


QSGSLTR 


1549 


57 . 5 


671 


TCGGACGAGT 12 92 


RSDNLAR 


1378 


DRbNLlK 


14 64 


KbULLKl 


1550 


64 


851 


GAGATGGATC 12 93 


QSSNLQR 


1379 


RRDVLMN 


14 6b 


RLHNLQR 


1551 


74 


804 


GCAGCGGCTC 12 94 


QSSDLQR 


13 80 


RbDULNK 


14 6 D 


QbLrbLlK 


1552 


82.5 


669 


GACGAGTCGG 12 95 


RSDELRT 


TOOT 

1381 


RbDJMLAK 


146 / 


DKbJMLiK 


1553 


90 


682 


y^ m/~1 /~i TV /~t /~1 TV /"I T O 

GCTGCAGGAG 12 96 


RSDHLAR 


13 82 


QSCjDLIK 


A c o 
14 6 o 


QboDLibK 


1554 


90 


845 


GAGATGGATC 12 97 


QSSNLQR 


■1 ^ p^ 

1383 


RSDALRQ 


1469 


RLHNLQR 


1555 


112 . 5 


663 


AGAGCGGCAC 12 98 


QTGSLTR 


1384 


RSDELQR 


1 /I T pi 
14 70 


T/T^TT*TT>^T /^TV 

KNWKLQA 


1556 


115 


738 


GCGGGGTCCG 12 99 


T~iT*i/~irnT mm 

ERGTLTT 


■1 O Pl ^ 

1385 


T^P1T~\TTT Pl TV 

RSDHLSR 


"1 y1 T 1 

1471 


RSDDLRR 


1557 


120 


664 


TV y~1 TV y~1 /"i TV 1 T /*\ 

AGAGCGGCAC 13 00 


QTGSLTR 


13 86 


RADTLRR 


14 / Z 


TV pi C" "D T A Ti 

AbbKLAl 


1558 


125 


833 


GACTAGGACC 13 01 


T— iTj''Tv TvTT rm~v 

EKANLTR 


1387 


RSDNLTK 


1 y1 •^ o 

1473 


DRSNLTR 


1559 


136 


685 


GCTGCAGGAG 13 02 


nT^TTX TV T~V 

RSDHLAR 


1388 


QSGSLTR 


1^14: 


PNPiOT\T PiTl 

QSSDLSR 


1560 


150 


835 


m Ti y~i /~i /"I TV y^ m 1 

TAGGGAGCGT 13 03 


RADTLRR 


T o p^ 

1389 


QSGHLTR 


14 75 


RbDNLl I 


1561 


150 


847 


TAGGGAGCGT 13 04 


RSDDLTR 


1 P\ PV 

1390 


QSGHLTR 


1476 


pn~NXTT mm 

RSDNLTT 


1562 


150 


818 


GAATGGGCAG 13 0 5 


QSGSLTR 


1 PV "1 

1391 


n piT^TTT mm 

RSDHLTT 


1477 


PX pi OTVTT T Tn 

QSSNLVR 


1563 


167 


834 


GACTAGGACC 13 06 


EKANLTR 


1392 


T*» pn~^TTT rnm 

RSDHLTT 


1 yl •^ P» 

1478 


DRSNLTR 


1564 


186 


837 


GGGGCCCTGG 13 07 


RSDALRE 


T O Pi T 

13 93 


DRSSLTR 


1 /I T PV 

147 9 


RSDHLbR 


1565 


^2 ^2 


764 


GCAGAGGCTG 13 0 8 


TSGELVR 


T P\ ^ 

13 94 


T~l pi T^TVTT TV n 

RSDNLAR 


1 /I n pv 

14 8 0 


QSGDLTR 


1566 


255 


774 


TV y^ y^y^ y^ m TV /~l "1 

GCAGCGGTAG 13 09 


pi TV T "A T~» 

QRSALAR 


1395 


RSDELQR 


1481 


/^PiP^T^T Tin 

QSGDLTR 


1567 


258 


765 


GCCGAGGCCG 1310 


Tr'Tl/^'rnT TV n 

ERGTLAR 


1 "5 Pi ^ 

13 96 


"D O TMVTT 7V "n 

RbDJMLAR 


T y1 O O 

14 o2 


T?T5/^rpx TV o 
IIjKCjILAK 


1568 


262 . 5 


766 


/^y^/~l/~1TV y^ y~l /~t y~1 y^ T 1 

GCCGAGGCCG 1311 


TTn/~tfTIT TV T\ 

ERGTLAR 


1 o p^ •^ 

13 97 


P*T~M^TT TV T~i 

RSDNLAR 


1483 


T~\T~i OT^T rrrn 

DRSDLTR 


1569 


262 . 5 


775 


y~< y~l TV y^ y^ y~1 y~1 m TV y^ 1 

GCAGCGGTAG 1312 


y^ PI /"^ TV T m 

QSGALTR 


■T O Pv PV 

1398 


RSDELQR 


1484 


/^pi/™tT^T rm~v 

QSGDLTR 


1570 


265 


763 


GCAGAGGCTG 1313 


m PI y*^ "n T T Ti~\ 

TSGELVR 


•1 o p\ 
1399 


1"^ PI T^TVTX TV 

RSDNLAR 


1485 


y*\ PI /~i i~\ T mi^ 

QSGSLTR 


1571 


275 


838 


GGGGCCCTGG 1314 


T~t PI T~\ TV T T~\ T~l 

RSDALRE 


1400 


T*vnpiPiT rm~v 

DRSSLTR 


1486 


RSDHLTA 


1572 


300 


841 


GAGTGTGAGG 1315 


RSDNLAR 


1401 


QSSHLAS 


1487 


RSDNLAR 


1573 


300 


770 


TTGGCAGCCT 1316 


DRSSLTR 


1402 


QSGSLTR 


1488 


RSDSLTK 


1574 


325 


767 


GGGGGAGCTG 1317 


QSSDLAR 


1403 


QSGHLQR 


1489 


RSDHLSR 


1575 


335 



49 



8325-00011.20 
S11-US2 

800 TTGGCAGCCT 1318 ERGTLAR 1404 QSGSLTR 1490 RSDSLTK 1576 

832 GACTAGGACC 1319 EKANLTR 14 0 5 RSDNLTT 14 91 DRSNLTR 1577 

844 GAGATGGATC 132 0 QSSNLQR 14 06 RSDALRQ 14 92 RSDNLQR 1578 

683 GCTGCAGGAG 1 3 2 1 QSGHLAR 1407 QSGSLTR 1493 QSSDLSR 1579 
805 GCAGCGGTAG 1322 QRS ALAR 14 0 8 RSDELQR 14 94 QSGSLTR 1580 

83 9 GAGTGTGAGG 1323 RSDNLAR 14 09 TSDHLAS 1495 RSDNLAR 1581 

84 0 GAGTGTGAGG 1324 RSDNLAR 1410 MSHHLKT 14 96 RSDNLAR 1582 

830 GGAGAGTCGG 1325 RSDELRT 1411 RSDNLAR 1497 QRAHLAR 1583 

831 GGAGAGTCGG 1 3 2 6 RSDDLTK1412 RSDNLAR 14 98 

684 GCTGCAGGAG 1327 RSAHLAR 1413 QSGSLTR 1499 
846 GAGATGGATC 132 8 QSSNLQR 1414 RRDVLMN 1500 

819 AAGTAGGGTG 132 9 QSSHLTR 1415 RSDNLTT 1501 

820 ACGGTAGTTA 1 3 3 0 QSSALTR 1416 QRSALAR 1502 

821 ACGGTAGTTA 1331 NRATLAR 1417 QRSALAR 1503 

822 GTGTGCTGGT 13 32 RSDHLTT 1418 ERQHLAT 1504 

823 GTGTGCTGGT 1333 RSDHLTK1419 ERQHLAT 1505 

824 GTGTGCTGGT 1334 RSDHLTT 142 0 DRSHLRT 1506 

885 GTGTGCTGGT 1 3 3 5 RSDHLTK1421 DRSHLRT 1507 

886 TCAGTGGTAT 13 36 QSSSLVR 1422 RSDALSR 15 08 

889 ATGGATTCAG 1337 QSGSLTT 1423 QQSNLVR 150 9 

890 CTGGTATGTC 133 8 QRSHLTT 1424 QRSALAR 1510 
896 AAGTAGGGTG 133 9 TSGHLVR 1425 RSDNLTT 1511 

898 ACGGTAGTTA 1340 NRATLAR 142 6 QSSSLVR 1512 

899 CTGGTATGTC 1341 QRSHLTT 1427 QSSSLVR 1513 

900 CTGGTATGTC 1342 MSHHLKE 1428 QSSSLVR 1514 

901 CTGGTATGTC 1343 MSHHLKE 142 9 QRSALAR 1515 
773 GCAGCGGTAG 1344 QSGALTR 143 0 RSDELQR 1516 
768 GGGGGAGCTG 134 5 QSSDLAR 1431 QRAHLER 1517 
681 GCTGCAGGAG 134 6 RSAHLAR 1432 QSGDLTR 1518 



RSDSLTK 1576 


400 


DRSNLTR 1577 


408 


RSDNLQR 1578 


444 


QSSDLSR 1579 


500 


QSGSLTR 1580 


500 


RSDNLAR 1581 


625 


RSDNLAR 1582 


625 


QRAHLAR 1583 


683 


QRAHLAR 1584 


700 


QSSDLSR 1585 


850 


RSDNLQR 1586 


889 . 5 


RSDNLTQ 1587 


1000 


RSDTLTQ 1588 


1000 


RSDTLTQ 1589 


1000 


RSDALAR 1590 


1000 


RSDALAR 1591 


1000 


RSDALAR 1592 


1000 


RSDALAR 1593 


1000 


QSGDLRT 1594 


1000 


RSDALTQ 1595 


1000 


RSDALRE 1596 


1000 


RSDNLTQ 1597 


1000 


RSDTLTQ 1598 


1000 


RSDALRE 1599 


1000 


RSDALRE 160 0 


1000 


RSDALRE 1601 


1000 


QSGSLTR 16 02 


1250 


RSDHLSR 1603 


2000 


QSSDLSR 1604 


3000 
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TABLE 4 







SEQ 


Fl 




CT) o 44- 
oBbtf 


TARGET ID 






607 


AAGGTGGCAG 1605 


QSGDLTR 




608 


TTGGCTGGGC 1606 


GSWHLTR 




611 


GTGGCTGCAG 1607 


QSGDLTR 




612 


GTGGCTGCAG 1608 


ji^v fill III 1 ^ 

QSGTLTR 




613 


TTGGCTGGGC 1609 


RSDHLAR 




614 


TTGGCTGGGC 1610 


RSDHLAR 




616 


GAGGAGGATG 1611 


QSSNLQR 




617 


AAGGGGGGG 1612 


RSDHLSR 




618 


AAGGGGGGG 1613 


RSDHLSR 




619 


AAGGGGGGG 1614 


RSDHLSR 


'^^"^ 


620 


AAGGGGGGG 1615 


RSDHLSR 




621 


AAGGGGGGG 1616 


RSDHLSR 


fa s 


624 


ACGGATGTCT 1617 


DRSALAR 




628 


TTGTAGGGGA 1618 


RSDHLTR 




629 


TTGTAGGGGA 1619 


RSSHLTR 




630 


CGGGGAGAGT 162 0 


RSDNLAR 




646 


TTGGTGGAAG 1621 


QSGNLAR 




647 


TTGGTGGAAG 1622 


QSANLAR 




651 


GTTGTGGAAT 1623 


QSGNLSR 




652 


TAGGAGGCTG 1624 


QSSDLQR 




653 


TAGGAGGCTG 1625 


TTSDLTR 




654 


TAGGCATAAA 162 6 


QSGNLRT 




655 


TAGGCATAAA 162 7 


QSGNLRT 




656 


TAGGCATAAA 162 8 


QSGNLRT 




657 


TAGGCATAAA 162 9 


QSGNLRT 




660 


GAGGGAGTTC 163 0 


NRATLAR 



bbQ 


F2 


SEQ 




Kd 






ID 






T T A T 
1/0/ 


RSDSLAR 


1809 


P T PiTSTD T" a 1 Q 1 1 
KXjL^rJKl/1 Xi7XX 


O ■ 3 


I T r* o 
1 /Do 


QSSDLQR 


1810 


■DCPlCT "PXT 1 Q1 0 

K.0LJ0X1 i Jx xy±z 


Q 
o 


1 "7 n o 

1 / u y 


QSSDLQR 1811 


TPQ'nai.a'D 1 Q1 


11 R 
X X ■ o 


1 /lU 


QSSDLQR 


1812 


"DCnST A'D 1 Q1 A 




1 /IX 


QSSDLQR 


1813 


isSjiJi\Li i 0 ± _7 J- D 


1 4 R 


1 /Iz 


QSSDLQR 


1814 


PQFlQT TV" 1 Q1 ^ 


O 


17 1 J 


^ RSDNLAR 


1815 


KoUiNlXiyK XyX / 




1714 


RSDHLTR 


1816 


KinXJJNJyiia xyxb 


X 


1715 


RSDHLTR 


1817 


KKXJJMjyiiy xyxy 


U . DD 


1 /Id 


RSDHLTR 


1818 


KlvUiMr'l 1 iM X y Z U 


1 J. 
X . O rt 


1717 


RSDHLTR 


1819 


KXiDJNJKiA XyzX 


U . o4 


1718 


RSDHLTR 


1820 


KLiDJNJKiy xyzz 


U - / o 


1719 


TSANLAR 


1821 


KbUlLiKb XyZJ 


"7 

/ 


1720 


RSDNLTT 


1822 


kcjjLJali 1 o X y z ft 


X-D U 


1721 


RSDNLTT 


1823 


KLrDAXlib XyZD 


X D U 


1722 


QSGHLQR 


1824 


KoXJrlXiKii XyZo 


o / . -D 


172i 


RSDALAR 


1825 


kvjUAXi i o X y z / 




X /z 4 


RSDALAR 


1826 


KLjJJAXj i O X Z O 




1 T 0 c 

X /z b 


RSDALAR 


1827 


"MP A TT A P 1 Q O Q 
iM KA X XiAK X y Z i? 




1726 


RSDNLAR 


1828 


RSDNLTT 193 0 


1.5 


1727 


RSDNLAR 


1829 


RSDNLTT 1931 


5.5 


1728 


QSGSLTR 


1830 


RSDNLTT 1932 


105 


1729 


QSSTLRR 


1831 


RSDNLTT 1933 


1000 


1730 


QSGSLTR 


1832 


RSDNLTS 1934 


540 


1731 


QSSTLRR 


1833 


RSDNLTS 193 5 


300 


1732 


QSGHLTR 


1834 


RSDNLAR 193 6 


8.25 



51 



661 


GAGGGAGTTC 


1631 


TTSALTR 


1 

1733 


665 


GCGGAGGCGC 


1632 


RSDDVTR 


1734 


689 


AAGGCGGAGA 


1633 


RSDNLTR 


1735 


692 


AAGGCGGAGA 


1634 


RSDNLTR 


1736 


693 


AAGGCGGAGA 


1635 


RSDNLTR 


1737 


694 


AAGGCGGAGA 


1636 


RSDNLTR 


1738 


695 


GGGGGCGAGC 


1637 


RSSNLTR 


1739 


697 


TGAGCGGCGG 


1638 


RSDELTR 


1740 


698 


TGAGCGGCGG 


1639 


RSDELTR 


1741 


699 


GCGGCGGCAG 


1640 


QSGSLTR 


1742 


700 


GCGGCGGCAG 


1641 


QSGDLTR 


1743 


701 


GCAGCGGAGC 


1642 


RSDNLAR 


1744 


702 


GCAGCGGAGC 


1643 


RSDNLAR 


1745 


704 


AAGGTGGCAG 


1644 


QSGDLTR 


1746 


705 


GGGGTGGGGC 


1645 


RSDHLAR 


1747 


706 


y^ y^ y^ 1 1 1 y"H ✓"H y>i y^ 

GGGGTGGGGC 


1646 


RSDHLAR 


1748 


708 


GAGTCGGAA 


1647 


QSANLAR 


i#4 f ^ J| y^ 

1749 


709 


GAGTCGGAA 


1648 


QSANLAR 


1750 


710 


GAGTCGGAA 


1649 


QSGNLAR 


1751 


711 


GAGTCGGAA 


1650 


QSGNLAR 


1752 


712 


GGTGAGGAGT 


1651 


RSDNLAR 


1753 


713 


GGTGAGGAGT 


1652 


RSDNLAR 


1754 


714 


TGGGTCGCGG 


1653 


RSDELRR 


1755 


715 


TGGGTCGCGG 


1654 


RADTLRR 


1756 


716 


TTGGGAGCAC 


1655 


QSGSLTR 


1757 


717 


TTGGGAGCAC 


1656 


QSGSLTR 


1758 


718 


TTGGGAGCAC 


1657 


QSGSLTR 


1759 


719 


GGCATGGTGG 


1658 


RSDALTR 


1760 


720 


GAAGAGGATG 


1659 


TTSNLAR 


1761 


722 


ATGGGGGTGG 


1660 


RSDALTR 


1762 


724 


GGCATGGTGG 


1661 


RSDALTR 


1763 
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yovjrlij iK 


1 Q "5 
X O J Z) 


RSDNLAR 


1937 


X . / J 


KoUiMIj IK 


1 Q "5 
X O J D 


RSDDLRR 


1938 


1 o 

XZ . 3 




1 Q O '7 


RLDNRTA 


1939 


. b 




1 Q 1 Q 
X d -5 O 


RSDNLTQ 


1940 


o X 


E) A ■PiT'T 'DTD 

KAJJ 1 LjKK 


-1 Q O Q 

X o o y 


RLDNRTA 


1941 


y b 


"D A T^TT DO 

KAJJi J-iKK 


1 Q yi n 
X U 


RSDNLTQ 


1942 


^ o . b 


TiD CUT 7\ D 


1 Q yi 1 
X b ^ X 


RSDHLTR 


1943 


Q c n 
o b U 


DCTl'CT CD 


T Q /l O 


QSGHLTK 


1944 


o A n 


DCniTT CD 


1 Q /I Q 

X o4o 


QSHGLTS 


1945 


o A n 




X O 4 4 
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TABLE 5 



SEP SEQ SEP SEP Kd 

SBS# IMGET p^ ID F3ID-^ 

903 ATGGAAGGG 2013 RSDHLAR2513 QSGNLAR3013 RSDALRQ 3513 1.027 

904 AAGGGTGAC 2014 DSSNLTR2514 QSSHLAR3014 RSDNLTQ3514 1 

905 GTGGTGGTG 2 015 RSSALTR2515 RSDSLAR3015 RSDSLAR3515 1.15 

908 AAGGTCTCA 2 016 QSGDLRT2516 DRSALAR3 016 RSDNLRQ 3 516 50 

909 GTGGAAGAA 2017 QSGNLSR2517 QSGNLQR3017 RSDALAR3517 16.4 

910 ATGGAAGAT 2018 QSSNLAR2518 QSGNLQR3018 RSDALAQ 3518 0.03 

911 ATGGGTGCA 2019 QSGSLTR2519 QSSHLAR3019 RSDALAQ 3519 0.91 

912 TCAGAGGTG 2020 RSDSLAR2520 RSDNLTR3020 QSGDLRT 3520 0.135 

914 CAGGAAAAG 2021 RSDNLTQ 2521 QSGNLAR3021 RSDNLRE 3 521 1.26 

915 CAGGAAAAG 2022 RSDNLRQ2522 QSGNLAR3022 RSDNLRE 3522 45.15 

916 GAGGAAGGA 2023 QSGHLAR2523 QSGNLAR3023 RSDNLQR 3523 1.3 

919 TCATAGTAG 2024 RSDNLTT2524 RSDNLRT3024 QSGDLRT 3524 250 

920 GATGTGGTA 2025 QSSSLVR2525 RSDSLAR3025 TSANLSR3525 4 

921 AAGGTCTCA 2026 QSGDLRT 2526 DPGALVR3026 RSDNLRQ 352 6 11 

922 AAGGTCTCA 2 02 7 QSHDLTK2 52 7 DRSALAR3 02 7 RSDNLRQ 3 527 4 

923 AAGGTCTCA 2 02 8 QSHDLTK2 52 8 DPGALVR302 8 RSDNLRQ 3 52 8 2 

926 GTGGTGGTG 2029 RSDALTR2529 RSDSLAR 3029 RSDSLAR3529 7.502 

927 CAGGTTGAG 2030 RSDNLAR2530 TSGSLTR3030 RSDNLRE 3 53 0 3.61 

928 CAGGTTGAG 2031 RSDNLAR2531 QSSALTR3031 RSDNLRE 3 531 25 

929 CAGGTAGAT 2032 QSSNLAR2532 QSATLAR3032 RSDNLRE 3532 1.3 

931 GAGGAAGAG 2033 RSDNLAR2533 QSSNLVR3033 RSDNLAR 3533 2 

932 ATGGAAGGG 2034 RSDHLAR2534 QSSNLVR 3034 RSDALRQ 3534 797 

933 GACGAGGAA 2035 QSANLAR2535 RSDNLAR3035 DRSNLTR3535 500 

934 ATGGAAGAT 2036 QSSNLAR2536 QSGNLQR3036 RSDALTS 353 6 0.07 

935 ATGGGTGCA 2037 QSGSLTR2537 QSSHLAR 3037 RSDALTS 3 537 0.91 

937 GTGGGGGCT 2038 QSSDLTR2538 RS£)HLTR3038 RSDSLAR3538 0.03 

938 GTGGGGGCT 2039 QSSDLRR2539 RSDHLTR 3039 RSDSLAR3539 0.049 

93 9 GGGGGCTGG 2040 RSDHLTT254 0 DRSHLAR3 04 0 RSDHLSK354 0 0.352 

94 0 GGGGGCTGG 2 041 RSDHLTK2 541 DRSHLAR3 041 RSDHLSK3 541 1.5 

941 GGGGCTGGG 2 042 RSDHLAR2542 QSSDLRR 3 042 RSDKLSR3 542 0.077 

942 GGGGCTGGG 2043 RSDHLAR2543 QSSDLRR3043 RSDHLSK3543 0.13 

943 GGGGCTGGG 2044 RSDHLAR2544 TSGELVR3044 RSDKLSR3544 0.067 

944 GGGGCTGGG 2 04 5 RSDHLAR2 54 5 TSGELVR3 04 5 RSDHLSK354 5 0.027 
94 5 GGTGCGGTG 2 04 6 RSDSLTR2 54 6 RAbTLRR 3 04 6 MSHHLSR3 54 6 0.027 

946 GGTGCGGTG 2047 RSDSLTR2547 RSDVLQR3047 MSHHLSR3547 0.027 

947 GGTGCGGTG 2048 RSDSLTR2548 RSDELQR3048 QSSHLAR 3548 0.013 

948 GGTGCGGTG 2049 RSDSLTR2549 RSDVLQR 3049 QSSHLAR3549 0.017 

962 GAGGCGGCA 2050 QSGSLTR2550 RSDELQR3 050 RSDNLAR3550 0.015 

963 GAGGCGGCA 2051 QSGSLTR2551 RSDDLQR3051 RSDNLAR3551 0.015 

964 GCGGCGGTG 2052 RSDALAR2552 RSDELQR 3052 RSDERKR3552 0.041 
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ERGDLTR2 553 RSDELQR3 053 RSDERKR 3 553 3.1 

ERGTLAR2 554 RSDNLSR3 054 RSDNLAR3 554 0.02 8 

DRSSLTR2555 RSDNLSR3055 RSDNLAR3555 0.055 

QSGSLTR2556 DRSSLTR3 056 RSDNLAR3556 1.4 

QSGSLTR2557 DRSDLTR3057 RSDNLAR3557 0.275 

ERGTLAR2558 DRSHLAR3058 RSDAIiAR3558 1.859 

DRSSLTR2559 DRSHLAR3059 RSDALAR3559 0.144 

ERGDLTR2560 DRSHLAR3 060 RSDALAR3 560 1,74 8 

DRSALTR2 561 RSDELQR 3 061 ERGTLAR 3 5 61 0.6 

DRSALTR2562 RSDELQR3062 DRSDLTR3562 0.038 

QSSDLTR2563 DRSSLTR3 063 RSDNLRE 3563 1.1 

QSSDLTR2564 DRSDLTR3064 RSDNLRE 3564 4.12 

RSDSLTR2565 QSGSLTR3065 RSDALRE 3565 0.017 

RSDSLTR2566 QSGDLTR3066 RSDALRE 3 566 1.576 

RSSDLTR2567 RSDELQR3067 RSDALRE 3567 1,59 

RSDDLTR2568 RSDELQR3 068 RSDALRE 3 568 2,2 

RSDDLTR2569 RSDELQR3069 RSDNLRE 3569 0.375 

RSDHLTT 2570 DRSHLAR3070 RSDELRE 3570 0,03 

RSDHLTK2571 DRSHLAR3 071 RSDELRE 3571 1,385 

RSDNLAR2572 DRSHLAR 3072 DRSNLTR 3572 1.6 

RSDNLAR2573 DRSHLAR3073 EKANLTR3573 0.965 

QSSNLQR2574 QSSDLQR3 074 MSHHLSR 3574 1.6 

QSSNLQR2575 QSSDLQR3075 TSGHLVR3575 33.55 

TSGNLVR2576 QSSDLQR3076 MSHHLSR 3576 0.15 

RSDHLAR2577 RSDNLAR3 077 MSHHLSR 3577 1.9 

DRSHLTR2578 RSDSLAR3 078 RSDNLTQ 3578 5.35 

DRSHLTR2579 SSGSLVR3079 RSDNLTQ 3579 0.06 

RSDHLAR2580 TSGELVR3080 RSDHLSR3580 3,1 

RSDHLTK2581 DRSHLAR3 081 RSDHLSR3581 0,03 

QSANLAR2582 RSDNLAR3082 RSDHLSK3582 0.08 

DRSALAR2583 RSDALTS 3 083 RSDNLRE 3 583 9.6 

QSSDLTR2584 RSDNLAR3084 QSGHLNQ3584 1,65 

RSANLRT 2585 RSDNLTK3 085 RSDTLKQ3585 0.23 

QSSDLTR2586 RSDNLAR3 086 QSGKLTQ 3 586 0,6 

DRSALAR2587 RSDALTR3087 RSDNLRE 3 587 11.15 

EKANLTR2588 QSSDLSR3088 QRAHLAR3588 1.8 

RSDNLVR2589 RSDNLAR3089 RSDERKR3589 0.028 

RSANLRT 2590 RSDNLTK3 090 RSDTLRS 3 5 90 0.118 

RSDNLTT 2591 RSDNLTK3 0 91 RSDTLRS 3 5 91 1.4 

RSDDLTR2592 RSDHLTR3092 QRASLTR3592 0.898 

QSSNLQR 2593 QSGHLTR 3093 RLHNLAR3593 167 

RSDNLSR2 5 94 RSpSLTQ 3 094 RLHNLAR35 94 0.4 

RSDNLSR2595 RSDSLTQ 3095 RSDNLSR3595 1.9 

QSSNLQR2596 QSGHLTR3 096 RSDNLAR3 596 8.2 

RSADLTR2597 RSDSLAR3097 RSDSLTK3597 0.03 

RSDHLTR2598 QSSSLVR3098 DRSNLTR3598 0.032 

QSSNLQR2599 QSGHLNQ 3 099 RSDNLAR3599 0.15 
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FINGER (N ^ C) 


TRIPLET f5'">3') 


Fl 


F2 
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ATC 






RXDAXXQ 


CGG 
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QXGNXXR 
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DXSNXXR 




DXSNXXR 


GAG 


RXDNXXR 
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RXDNXXR 


RXDNXXR 


GAT 


QXSNXXR 
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TXGNXXR 


TXGNXXR 




GCA 
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QXGDXXR 
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RXDEXXR 
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RXDTXXK 
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GGC 
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RXDSXXR 
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RXDAXXR 
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RXDNXXT 




TCG 


RXDDXXK 






TGT 




TXDHXXS 
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